Sei sulla pagina 1di 14

Classification : Basic

Concepts
Lecture 10

Lect10-11-08-09 1
• Classification:
– predicts categorical class labels
– Most suited for predicting/ describing data
sets with binary or nominal categories.
– Less effective for ordinal categories.
– Supervised learning

Lect10-11-08-09 2
Examples of Classification
• 1.Predicting tumor cells as “benign” or
“malignant”

• 2. Classifying credit card transactions as


“legitimate” or “fraudulent”

• 3. Classifying secondary structures of protein


as “alpha-helix”, “beta-sheet” or “random coil”

• 4. Categorizing news stories as “finance”,


“weather”, “entertainment”, “sports” etc

Lect10-11-08-09 3
Examples Contd…
• 5. Determine those characteristics that differentiate
individuals who have suffered a heart attack from those
who have not suffered.

• 6. Develop a profile of a successful man.

• 7. Classifying galaxies based on their shapes.

• 8. Detecting spam emails based on their message


header and content.

Lect10-11-08-09 4
Lect10-11-08-09 5
Classification: Definition
• Given a collection of records (training set)
– Each record contains a set of attributes, one of the attributes is the class label.

• Find a model for class attribute as a function of the values of other


attributes.

• Goal: previously unseen records should be assigned a class as


accurately as possible.
– A test set is used to determine the accuracy of the model. Usually, the given
data set is divided into training and test sets, with training set used to build the
model and test set used to validate it.

Lect10-11-08-09 6
Illustrating Classification Task
Tid Attrib1 Attrib2 Attrib3 Class Learning
No
1 Yes Large 125K
algorithm
2 No Medium 100K No

3 No Small 70K No
No
4 Yes Medium 120K
Induction
5 No Large 95K Yes
6 No Medium 60K No
7 Yes Large 220K No Learn
8 No Small 85K Yes Model
9 No Medium 75K No
10 No Small 90K Yes
10

Training Set
Apply
Tid Attrib1 Attrib2 Attrib3 Class Model
11 No Small 55K ?
12 Yes Medium 80K ?
13 Yes Large 110K ? Deduction
Lect10-11-08-09 7
14 No Small 95K ?
Definition
– Is a task of learning a target function f
(classification model) that maps each attribute set
x to one of the predefined class label y.
– Each record is characterized by a tuple (x,y) where x
is the attribute set and y is special attribute (class
label)
– Output attributes are also known as dependent
variables
– Input attributes are termed as independent variables
– Classification can be categorized based on whether
output variable is discrete/categorical.
– Or whether models are designed for a current
condition/predicting future outcomes.

Lect10-11-08-09 8
The vertebrate data set
Name Body temp Skin Cover Gives birth Aquatic creature Legs Hibernates Class label

human warm hair y n y n mammal

python cold scales n n n y reptile

salmon cold scales n yes n n fish

whale warm hair y yes n n mammal

frog cold none n semi y y amphibian

komodo cold scales n n y n reptile

dragon

bat warm hair y n y y mammal

pigeon warm feathers n n y n bird

cat warm fur y n y n mammal

leopard cold scales y yes y n fish

shark

turtle cold scales n semi y n reptile

penguin warm feathers n semi y n bird

Lect10-11-08-09 9
• A Classification model is useful for the
following purposes:

– Descriptive Modeling
– Predictive Modeling

Lect10-11-08-09 10
• Descriptive Modeling: A classification model can
be used as an explanatory tool to differentiate
between objects of different classes.
– Examples:
– (1)A bank loan officer wants to analyze the
data regarding the loans applications as
“safe” or “risky” for the bank.
– Here data analysis task is CLASSIFICATION,
where a model or classifier is constructed to
predict “categorical labels” such as “safe ” or
“risky”

Lect10-11-08-09 11
Name
The vertebrate dataset
Body temp Skin Cover Gives birth Aquatic creature Legs Hibernates Class label

human warm hair y n y n mammal

python cold scales n n n y reptile

salmon cold scales n yes n n fish

whale warm hair y yes n n mammal

frog cold none n semi y y amphibian

komodo cold scales n n y n reptile

dragon

bat warm hair y n y y mammal

pigeon warm feathers n n y n bird

cat warm fur y n y n mammal

leopard cold scales y yes y n fish

shark

turtle cold scales n semi y n reptile

penguin warm feathers n semi y n bird


Lect10-11-08-09 12
• Predictive Modeling: A classification model can be
used to predict the class label of unknown records.
– Example
– (1) Suppose a Marketing Manager wants to estimate
the amount that a customer will spend during an
ongoing sale.
– This is an example data analysis of numeric
prediction.
– Here the model (predictor) so constructed predicts a
continuous-valued function.

Lect10-11-08-09 13
– (2) A medical researcher wants to analyze
brain tumour data to predict which particular
type of treatment say A, B or C is to be given
to the patient.
– “Treatment A”, ”Treatment B”, or “Treatment
C” in this case, is classification task

Lect10-11-08-09 14

Potrebbero piacerti anche