Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Introduction
Decision Trees TDIDT: Top-Down Induction of Decision Trees Attri ute se!ection "ntrop#$ Infor%ation$ Infor%ation &ain &ain 'atio +u%eric ,a!ues -issing ,a!ues .runing
Acknowledgements: -an# s!ides ased on Frank / 0itten$ a few on 1an$ 2tein ac3 / 1u%ar
1 J. Frnkranz
ID3
().*
Decision Trees
Nodes: test for t3e 5a!ue of a certain attri ute Edges: correspond to t3e outco%e of a test connect to t3e ne6t node or !eaf Leaves: ter%ina! nodes t3at predict t3e outco%e
to c!assifi# an e6a%p!e:
1. start at t3e root 4. perfor% t3e test 3. fo!!ow t3e edge corresponding to outco%e ). goto 4. un!ess !eaf *. predict t3at outco%e associated wit3 t3e !eaf
4 J. Frnkranz
Training
New Example
Classification
J. Frnkranz
A 2a%p!e Task
Day 07-05 07-06 07-07 07-0 07-!0 07-!" 07-!# 07-!5 07-"0 07-"! 07-"" 07-"$ 07-"6 07-$0 Temperature hot hot hot cool cool mild cool mild mild mild hot mild cool mild Outlook sunny sunny overcast rain overcast sunny sunny rain sunny overcast overcast rain rain rain Humidity high high high normal normal high normal normal normal high normal high normal high Windy false true false false true false false false true true false true true false Play Golf? no no yes yes yes no yes yes yes yes yes no no yes
today tomorrow
cool mild
sunny sunny
normal normal
false false
% %
J. Frnkranz
tomorrow
mild
sunny
normal
false
J. Frnkranz
Di5ide-And-(on8uer A!gorit3%s
TDIDT: Top-Down Induction of Decision Trees di5ide t3e pro !e% in su pro !e%s so!5e eac3 pro !e%
ID3 A!gorit3%
Function ID3
*n+ut: "6a%p!e set S ,ut+ut: Decision Tree DT return a new !eaf and !a e! it wit3 c
"!se
i. 2e!ect an attri ute A according to so%e 3euristic function ii. &enerate a new node DT wit3 A as test iii.For eac3 ,a!ue vi of A ;a< Let Si ! a!! e6a%p!es in S wit3 A ! vi ; < =se ID3 to construct a decision tree DTi for e6a%p!e set Si ;c< &enerate an edge t3at connects DT and DTi
: J. Frnkranz
a!so e6p!ains a!! of t3e training data wi!! it genera!ize we!! to new data?
> J. Frnkranz
J. Frnkranz
0e want to grow a si%p!e tree B a good attri ute prefers attri utes t3at sp!it t3e data so t3at eac3 successor node is as pure as posssi !e
i.e.$ t3e distri ution of e6a%p!es in eac3 node is so t3at it %ost!# contains e6a%p!es of a sing!e c!ass 0e want a %easure t3at prefers attri utes t3at 3a5e a 3ig3 degree of CorderD: -a6i%u% order: A!! e6a%p!es are of t3e sa%e c!ass -ini%u% order: A!! c!asses are e8ua!!# !ike!# Anot3er interpretation: "ntrop# is t3e a%ount of infor%ation t3at is contained a!! e6a%p!es of t3e sa%e c!ass B no infor%ation
1A J. Frnkranz
In ot3er words:
S is a set of e6a%p!es p is t3e proportion of e6a%p!es in c!ass p ! " # p is t3e proportion of e6a%p!es in c!ass
"ntrop#:
E S = plog $ p plog $ p
Interpretation:
%utloo& ! sunny:
%utloo& ! rainy :
13
J. Frnkranz
-ro.lem:
"ntrop# on!# co%putes t3e 8ua!it# of a sing!e ;su -<set of e6a%p!es corresponds to a sing!e 5a!ue Eow can we co%pute t3e 8ua!it# of t3e entire sp!it? corresponds to an entire attri ute (o%pute t3e weig3ted a5erage o5er a!! sets resu!ting fro% t3e sp!it weig3ted # t3eir size
/olution:
I S , A =
S i
E0am+le:
E S i S
Infor%ation &ain
we co%pute t3e a5erage entrop# and co%pare t3e su% to t3e entrop# of t3e origina! set S
S i
E S i S
Note:
%a6i%izing infor%ation gain is e8ui5a!ent to %ini%izing a5erage entrop#$ ecause E0S1 is constant for a!! attri utes A
1* J. Frnkranz
"6a%p!e
Ga in S , Outlook =*.$./
17
"6a%p!e ;(td.<
Outlook is se!ected as t3e root note
"6a%p!e ;(td.<
Humidity is se!ected
J. Frnkranz
"6a%p!e ;(td.<
Humidity is se!ected
4A
J. Frnkranz
.roperties of "ntrop#
"ntrop# is t3e on!# function t3at satisfies a!! of t3e fo!!owing t3ree properties
03en node is pure$ %easure s3ou!d e zero 03en i%purit# is %a6i%a! ;i.e. a!! c!asses e8ua!!# !ike!#<$ %easure s3ou!d e %a6i%a! -easure s3ou!d o e# multistage property: p, q, r are c!asses in set S$ and T are e6a%p!es of c!ass t = q r
T E p , q , r S = E p , t S E q , r T S
B decisions can e %ade in se5era! stages
.ro !e%atic: attri utes wit3 a !arge nu% er of 5a!ues e6tre%e case: eac3 e6a%p!e 3as its own 5a!ue e.g. e6a%p!e IDI 2ay attri ute in weat3er data 2u sets are %ore !ike!# to e pure if t3ere is a !arge nu% er of different attri ute 5a!ues Infor%ation gain is iased towards c3oosing attri utes wit3 a !arge nu% er of 5a!ues T3is %a# cause se5era! pro !e%s: Overfitting se!ection of an attri ute t3at is non-opti%a! for prediction Fragmentation
"ntrop# of sp!it:
I 2ay = ". E [ *," ] E [ *," ]... E [ *," ] = *
"
43
J. Frnkranz
entrop# of distri ution of instances into ranc3es i.e. 3ow %uc3 infor%ation do we need to te!! w3ic3 ranc3 an instance e!ongs to
IntI S , A =
i
S i
S
log
S i
S
"6a%p!e:
9 ser5ation:
&ain 'atio
%odification of t3e infor%ation gain t3at reduces its ias towards %u!ti-5a!ued attri utes takes nu% er and size of ranc3es into account w3en c3oosing an attri ute
corrects t3e infor%ation gain # taking t3e intrinsic information of a sp!it into account
"6a%p!e:
!"p!rat#r! 0.693 0.247 1.577 0.157 0.7&& 0.152 1.000 0.152 Info: Gain: 0.940-0.911 Split info: info([4,6,4]) Gain ratio: 0.029/1.557 'in() Info: Gain: 0.940-0.&92 Split info: info([&,6]) Gain ratio: 0.04&/0.9&5 0.&92 0.04& 0.9&5 0.049 0.911 0.029 1.557 0.019
&ini Inde6
-an# a!ternati5e %easures to Infor%ation &ain -ost popu!ar a!ter%ati5e: &ini inde6
used in e.g.$ in (A'T ;(!assification And 'egression Trees< i%purit# %easure ;instead of entrop#<
Gini S =" pi
i
Gini S , A =
i
S iGini S
S
i
&ini &ain cou!d e defined ana!ogous!# to infor%ation gain ut t#pica!!# a5g. &ini inde6 is %ini%ized instead of %a6i%izing &ini gain
4: J. Frnkranz
4>
J. Frnkranz
Industria!-strengt3 a!gorit3%s
.er%it nu%eric attri utes A!!ow %issing 5a!ues Je ro ust in t3e presence of noise Je a !e to appro6i%ate ar itrar# concept descriptions ;at !east in princip!e<
'esu!t: (#15 Jest-known and ;pro a !#< %ost wide!#-used !earning a!gorit3
".g. te%p K )*
=n!ike no%ina! attri utes$ e5er# attri ute 3as %an# possi !e sp!it points 2o!ution is straig3tforward e6tension:
"5a!uate info gain ;or ot3er %easure< for e5er# possi !e sp!it point of attri ute (3oose D estL sp!it point Info gain for est sp!it point is info gain for attri ute
3A
J. Frnkranz
"6a%p!e
2ort a!! e6a%p!es according to t3e 5a!ue of t3is attri ute (ou!d !ook !ike t3is:
64 Yes 65 No 6& Yes 69 Yes 70 Yes 71 No 72 No 72 Yes 75 Yes 75 Yes &0 No &1 Yes &3 Yes &5 No
".g.
I Temperature 9 ,".) =
31
J. Frnkranz
"fficient (o%putation
Linear!# scan t3e sorted 5a!ues$ eac3 ti%e updating t3e count %atri6 and co%puting t3e e5a!uation %easure (3oose t3e sp!it position t3at 3as t3e est 5a!ue
(heat
No
No
No
2es
2es
2es
No
No
No
No
3a0a.le *ncome
60 55 7 $ 7 65 56 0 !
70 7" 7 $ 6 56 0 "
75 40 7 $ 5 56 0 $
45 47 7 $ # 56 ! $ 7 " #
0 " 56 " $ 7 ! #
5 7 56 $ $
!00 !!0 7 0 # 56 $ #
!"0 !"" 56 $ 5
!"5 !7" 56 $ 6 7 0 !
""0 "$0 56 $ 7 M 0 0
7 0 $
7 0 "
01#"0
01#00
01$75
01$#$
01#!7
01#00
0.300
01$#$
01$75
01#00
01#"0
34
J. Frnkranz
2p!itting ;%u!ti-wa#< on a no%ina! attri ute e63austs a!! infor%ation in t3at attri ute
+o%ina! attri ute is tested ;at %ost< once on an# pat3 in t3e tree +u%eric attri ute %a# e tested se5era! ti%es a!ong a pat3 in t3e tree
pre-discretize nu%eric attri utes ;B discretization<$ or use %u!ti-wa# sp!its instead of inar# ones can$ e.g.$ e co%puted # ui!ding a su tree using a sing!e nu%erica! attri ute. su tree can e f!attened into a %u!tiwa# sp!it ot3er %et3ods possi !e ;d#na%ic progra%%ing$ greed#...<
33 J. Frnkranz
-issing 5a!ues
if we are !uck#$ attri utes wit3 %issing 5a!ues are not tested # t3e tree sp!it t3e instance into fractiona! instances ;pieces< one piece for eac3 outgoing ranc3 of t3e node a piece going down a ranc3 recei5es a weig3t proportiona! to t3e popu!arit# of t3e ranc3 weig3ts su% to 1 use su%s of weig3ts instead of counts -erge pro a i!it# distri ution using weig3ts of fractiona! instances
3) J. Frnkranz
T3e s%a!!er t3e co%p!e6it# of a concept$ t3e !ess danger t3at it o5erfits t3e data
A po!#no%ia! of degree n can a!wa#s fit n+1 points +ote a CperfectD fit on t3e training data can a!wa#s e found for a decision treeH ;e6cept w3en data are contradictor#<
-re--runing:
stop growing a ranc3 w3en infor%ation eco%es unre!ia !e grow a decision tree t3at correct!# c!assifies a!! training data si%p!if# it !ater # rep!acing so%e nodes wit3 !eafs
-ost--runing:
.repruning
2top growing t3e tree w3en t3ere is no statistically significant association etween an# attri ute and t3e c!ass at a particu!ar node
-ost popu!ar test: chi-squared test ID3 used c3i-s8uared test in addition to infor%ation gain
9n!# statistica!!# significant attri utes were a!!owed to e se!ected # infor%ation gain procedure
37
J. Frnkranz
"ar!# stopping
.re-pruning %a# stop t3e growt3 process pre%ature!#: early stopping (!assic e6a%p!e: O9'G.arit#-pro !e%
a 1 2 3 0 0 1
b 0 1 0
class 0 1 1 0
+o indi5idua! attri ute e63i its an# 4 1 1 significant association to t3e c!ass B In a dataset t3at contains O9' attri utes a and $ and se5era! irre!e5ant ;e.g.$ rando%< attri utes$ ID3 can not distinguis3 etween re!e5ant and irre!e5ant attri utes B .repruning wonPt e6pand t3e root node 2tructure is on!# 5isi !e in fu!!# e6panded tree
Jut:
.ost-.runing
asic idea
first grow a fu!! tree to capture a!! possi !e attri ute interactions !ater re%o5e t3ose t3at are due to c3ance
1.learn a complete and consistent decision tree that classifies all examples in the training set correctly 2.as long as the performance increases
try simplification operators on the tree evaluate the resulting trees make the replacement the results in the best estimated performance
3>
J. Frnkranz
.ostpruning
2u tree rep!ace%ent 2u tree raising error esti%ation on separate pruning set ;Creduced error pruningD< wit3 confidence inter5a!s ;().*Qs %et3od< significance testing -DL princip!e
3@
J. Frnkranz
2u tree rep!ace%ent
Jotto%-up (onsider rep!acing a tree on!# after considering a!! its su trees
)A
J. Frnkranz
2u tree raising
)1
J. Frnkranz
"rror on t3e training data is +9T a usefu! esti%ator ;wou!d resu!t in a!%ost no pruning< =se 3o!d-out set for pruning "ssentia!!# t3e sa%e as in ru!e !earning on!# pruning operators differ ;su tree rep!ace%ent< Deri5e confidence inter5a! fro% training data wit3 a user-pro5ided confidence !e5e! Assu%e t3at t3e true error is on t3e upper ound of t3is confidence inter5a! ;pessi%istic error esti%ate<
().*Ps %et3od
)4
J. Frnkranz
(onsider c!assif#ing " e6a%p!es incorrect!# out of + e6a%p!es as o ser5ing " e5ents in + tria!s in t3e ino%ia! distri ution. For a gi5en confidence !e5e! (F$ t3e upper !i%it on t3e error rate o5er t3e w3o!e popu!ation is !" 0 E , N 1 wit3 (FR confidence. "6a%p!e:
1AA e6a%p!es in a !eaf 7 e6a%p!es %isc!assified Eow !arge is t3e true error assu%ing a pessi%istic esti%ate wit3 a confidence of 4*R? t3is is on!# a 3euristicH ut one t3at works we!!
)3
+ote:
$
;*.$)0"**,/1
"*
:*.$)0"**,/1
J. Frnkranz
().*Ps %et3od
e=
$ #
#$ $N
$ N
$$ N
#$ .N $
"
#$ N
# is deri5ed fro% t3e desired confidence 5a!ue If c ! $)= t3en # ! *./+ ;fro% nor%a! distri ution< $ is t3e error on t3e training data N is t3e nu% er of instances co5ered # t3e !eaf
"rror esti%ate for su tree is weig3ted su% of error esti%ates for a!! its !ea5es BA node is pruned if error esti%ate of su tree is !ower t3an error esti%ate of t3e node
))
J. Frnkranz
"6a%p!e
f=0.33 e=0.47
f=0.5 e=0.72
f=0.33 e=0.47
asic idea
opti%ize t3e accurac# of a decision tree on a separate pruning set
1.split training data into a growing and a pruning set 2.learn a complete and consistent decision tree that classifies all examples in the growing set correctly 3.as long as the error on the pruning set does not increase
try to replace each node by a leaf (predicting the majority class) evaluate the resulting (sub-)tree on the pruning set make the replacement the results in the maximum error reduction
)7
J. Frnkranz
Assu%e
"5er# instance %a# 3a5e to e redistri uted at e5er# node etween its !eaf and t3e root (ost for redistri ution ;on a5erage<: O ;!og n<
2i%p!e wa#: one ru!e for eac3 !eaf ().*ru!es: greedi!# prune conditions fro% eac3 ru!e if t3is reduces its esti%ated error
(an produce dup!icate ru!es (3eck for t3is at t3e end !ook at eac3 c!ass in turn consider t3e ru!es for t3at c!ass find a DgoodL su set ;guided # -DL<
T3en
T3en rank t3e su sets to a5oid conf!icts Fina!!#$ re%o5e ru!es ;greedi!#< if t3is decreases error on t3e training data
)>
J. Frnkranz
Decision Lists
An ordered !ist of ru!es t3e first ru!e t3at fires %akes t3e prediction can e !earned wit3 a co5ering approac3 2i%i!ar to decision trees$ ut nodes %a# 3a5e %u!tip!e predecessors DA&s: Directed$ ac#c!ic grap3s t3ere are a few a!gorit3%s t3at can !earn DA&s !earn %uc3 s%a!!er structures ut in genera! not 5er# successfu! a decision !ist %a# e 5iewed as a specia! case of a DA&
)@ J. Frnkranz
Decision &rap3s
2pecia! case:
"6a%p!e
>ule (
>ule .
J. Frnkranz
().*ru!es s!ow for !arge and nois# datasets (o%%ercia! 5ersion (*.Aru!es uses a different tec3ni8ue
-c (onfidence 5a!ue ;defau!t 4*R<: !ower 5a!ues incur 3ea5ier pruning -m -ini%u% nu% er of instances in t3e two %ost popu!ar ranc3es ;defau!t 4< 9t3ers for$ e.g.$ 3a5ing on!# two-wa# sp!its ;a!so on s#% o!ic attri utes<$ etc.
*1
J. Frnkranz
a decision tree can e 5iewed as a set of non-o5er!apping ru!es t#pica!!# !earned 5ia divide-and-conquer a!gorit3%s ;recursi5e partitioning< -an# concepts 3a5e a s3orter description as a ru!e set !ow co%p!e6it# decision !ists are %ore e6pressi5e t3an !ow co%p!e6it# decision trees ;'i5est$ 1@>:< e6ceptions: if one or %ore attri utes are re!e5ant for t3e c!assification of all e6a%p!es ;e.g.$ parit#< 2eparate-and-(on8uer 5s. Di5ide-and-(on8uer
*3 J. Frnkranz
Learning strategies:
Discussion TDIDT
T3e %ost e6tensi5e!# studied %et3od of %ac3ine !earning used in data %ining Different criteria for attri uteGtest se!ection rare!# %ake a !arge difference Different pruning %et3ods %ain!# c3ange t3e size of t3e resu!ting pruned tree ().* ui!ds uni5ariate decision trees 2o%e TDITDT s#ste%s can ui!d %u!ti5ariate trees ;e.g. (A'T<
%u!ti-5ariate: a sp!it is not ased on a sing!e attri ute ut on a function defined on %u!tip!e attri utes
*)
J. Frnkranz
'egression Task
and use a c!assification !earning a!gorit3% Adapt t3e c!assification a!gorit3% to regression data B 'egression Trees and -ode! Trees
**
J. Frnkranz
'egression Trees
Differences to Decision Trees ;(!assification Trees<
Leaf +odes: .redict t3e a5erage 5a!ue of a!! instances in t3is !eaf 2p!itting criterion: -ini%ize t3e 5ariance of t3e 5a!ues in eac3 su set Si 2tandard de5iation reduction S i SDR A , S = SD S SD S i i S Ter%ination criteria: ,er# i%portantH ;ot3erwise on!# sing!e points in eac3 !eaf< !ower ound on standard de5iation in a node !ower ound on nu% er of e6a%p!es in a node .runing criterion: +u%eric error %easures$ e.g. -ean-28uared "rror
*7 J. Frnkranz
-ode! Trees
In a Leaf node
(!assification Trees predict a c!ass 5a!ue 'egression Trees predict t3e a5erage 5a!ue of a!! instances in t3e %ode! -ode! Trees use a !inear %ode! for %aking t3e predictions growing of t3e tree is as wit3 'egression Trees
Linear -ode!: '( % = & i v i % w3ere vi0%1 is t3e 5a!ue of attri ute Ai
i for e6a%p!e % and &i is a weig3t T3e attri utes t3at 3a5e een used in t3e pat3 of t3e tree can e ignored
0eig3ts can e fitted wit3 standard %at3 packages $ -ini%ize t3e -ean 28uared "rror (SE = * r ) )
)
*: J. Frnkranz
2u%%ar#
can e so!5ed using decision tree !earning iterati5e!# se!ect t3e est attri ute and sp!it up t3e 5a!ues according to t3is attri ute
can e so!5ed wit3 regression trees and %ode! trees difference is in t3e %ode!s t3at are used at t3e !eafs are grown !ike decision trees$ ut wit3 different sp!itting criteria si%p!er$ see%ing!# !ess accurate trees are often prefera !e e5a!uation 3as to e done on separate test sets
*> J. Frnkranz