Sei sulla pagina 1di 2

Classification

Data classification is the process of organizing data into categories for its usage to be done in efficient and
effective way. It makes essential data easy to find and retrieve. Once a data-classification scheme has
been created, security standards that specify appropriate handling practices for each category and storage
standards that define the data's lifecycle requirements should be addressed.
It is a systematic process for obtaining important and relevant information about data and metadata data
about data. The classification analysis helps identifying to which a set of categories different types of data
belong. Classification analysis is closely linked to cluster analysis as the classification can be used to
cluster data. For example, the email provider performs a well-known example of classification analysis:
they use algorithms that are capable of classifying your email as legitimate or mark it as spam.
To handle large storage data is a tedious job for users to identify accurate data from huge unstructured
data. So, there should be some mechanism which classifies unstructured data into organized form which
helps user to easily access required data. Classification techniques over big transactional database provide
required data to the users from large datasets more simple way. There are two main classification
techniques, supervised and unsupervised and thus big data concept comes into existence. The objective of
classification is to analyze huge data and to develop an accurate description or model for each organized
class using the feature present in the data. We use that training data to build a model of what a typical data
set looks like when it has one of the various target values. We then apply that model to data for which that
target value is currently unknown. The algorithm identifies new data points that match the model of each
target value. This model is used to classify test data for which the class descriptions are unknown.

Importance

Risk management

Legal Discovery

Compliances

predicts categorical class labels

Classifies data (constructs a model) based on the training set and the values (class labels) in a
classifying attribute and uses it in classifying new data

Application
Highly sensitive corporate and customer data that if disclosed could put the organization at
financial or legal risk.
Example: Employee social security numbers, customer credit card numbers
Sensitive internal data that if disclosed could negatively affect operations.

Example: Contracts with third-party suppliers, employee reviews


Internal data that is not meant for public disclosure.
Example: Sales contest rules, organizational charts
Data that may be freely disclosed with the public.
credit approval
target marketing
medical diagnosis
treatment effectiveness analysis

Classification technique is used to solve the challenges which classify the big data according to the format
of the data that must be processed, the type of analysis to be applied, the processing techniques at work,
and the data sources for the data that the target system is required to acquire, load, analyze, store and
process. Supervised classification techniques (Decision Tree and support vector machine) are also known
as directed or predictive classification. In this method, set of possible class is known in advanced.
Unsupervised classification techniques are also known as descriptive or undirected. In this method, set of
possible class is unknown, after classification we can assign name to that class.

Potrebbero piacerti anche