Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
3 No Small 70K No
6 No Medium 60K No
Training Set
Apply
Tid Attrib1 Attrib2 Attrib3 Class Model
11 No Small 55K ?
15 No Large 67K ?
10
Test Set
cal cal us
ri ri uo
ego ego tin ss
t t n a
ca ca co cl
Tid Refund Marital Taxable
Splitting Attributes
Status Income Cheat
cal cal us
i i o
or or nu
t eg
t eg
nti
a ss Single,
ca ca co cl MarSt
Married Divorced
Tid Refund Marital Taxable
Status Income Cheat
NO Refund
1 Yes Single 125K No
Yes No
2 No Married 100K No
3 No Single 70K No NO TaxInc
4 Yes Married 120K No < 80K > 80K
5 No Divorced 95K Yes
NO YES
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No There could be more than one tree that
10 No Single 90K Yes fits the same data!
10
6 No Medium 60K No
Training Set
Apply Decision
Tid Attrib1 Attrib2 Attrib3 Class
Model Tree
11 No Small 55K ?
15 No Large 67K ?
10
Test Set
Test Data
Start from the root of tree. Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
Test Data
Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
Test Data
Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
Test Data
Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
Test Data
Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
Test Data
Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married Assign Cheat to “No”
TaxInc NO
< 80K > 80K
NO YES
6 No Medium 60K No
Training Set
Apply Decision
Tid Attrib1 Attrib2 Attrib3 Class
Model Tree
11 No Small 55K ?
15 No Large 67K ?
10
Test Set
Refund
Don’t
Yes No
Cheat
Don’t Don’t
Cheat Cheat
Refund Refund
Yes No Yes No
Don’t Cheat
Cheat
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 17
Tree Induction
Greedy strategy.
– Split the records based on an attribute test
that optimizes certain criterion.
Issues
– Determine how to split the records
How to specify the attribute test condition?
How to determine the best split?
Greedy strategy.
– Split the records based on an attribute test
that optimizes certain criterion.
Issues
– Determine how to split the records
How to specify the attribute test condition?
How to determine the best split?
Size
{Small,
What about this split? Large} {Medium}
Taxable Taxable
Income Income?
> 80K?
< 10K > 80K
Yes No
Greedy strategy.
– Split the records based on an attribute test
that optimizes certain criterion.
Issues
– Determine how to split the records
How to specify the attribute test condition?
How to determine the best split?
Advantages:
– Inexpensive to construct
– Extremely fast at classifying unknown records
– Easy to interpret for small-sized trees
– Accuracy is comparable to other classification
techniques for many simple data sets
PREDICTED CLASS
Class=Yes Class=No
a: TP (true positive)
b: FN (false negative)
Class=Yes a b
ACTUAL c: FP (false positive)
CLASS Class=No c d
d: TN (true negative)
PREDICTED CLASS
Class=Yes Class=No
Class=Yes a b
ACTUAL (TP) (FN)
CLASS
Class=No c d
(FP) (TN)
ad TP TN
Accuracy
a b c d TP TN FP FN
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 31