Decision Tree

Decision-Tree Learning
Introduction

Decision Trees TDIDT: Top-Down Induction of Decision Trees Attri ute se!ection "ntrop#$ Infor%ation$ Infor%ation &ain &ain 'atio +u%eric ,a!ues -issing ,a!ues .runing
Acknowledgements: -an# s!ides ased on Frank / 0itten$ a few on 1an$ 2tein ac3 / 1u%ar
1 J. Frnkranz
ID3

().*

'egression and -ode! Trees
Decision Trees
a decision tree consists of
Nodes: test for t3e 5a!ue of a certain attri ute Edges: correspond to t3e outco%e of a test connect to t3e ne6t node or !eaf Leaves: ter%ina! nodes t3at predict t3e outco%e
to c!assifi# an e6a%p!e:
1. start at t3e root 4. perfor% t3e test 3. fo!!ow t3e edge corresponding to outco%e ). goto 4. un!ess !eaf *. predict t3at outco%e associated wit3 t3e !eaf
4 J. Frnkranz
Decision Tree Learning

In Decision Tree Learning, a new example is classified by submitting it to a series of tests that determine the class label of the example.These tests are organi ed in a hierarchical structure called a decision tree. The training examples are used for choosing appropriate tests in the decision tree. Typically, a tree is built from top to bottom, where tests that maximi e the information gain about the classification are selected first.
Training
New Example
Classification
J. Frnkranz
A 2a%p!e Task
Day 07-05 07-06 07-07 07-0 07-!0 07-!" 07-!# 07-!5 07-"0 07-"! 07-"" 07-"$ 07-"6 07-$0 Temperature hot hot hot cool cool mild cool mild mild mild hot mild cool mild Outlook sunny sunny overcast rain overcast sunny sunny rain sunny overcast overcast rain rain rain Humidity high high high normal normal high normal normal normal high normal high normal high Windy false true false false true false false false true true false true true false Play Golf? no no yes yes yes no yes yes yes yes yes no no yes
today tomorrow
cool mild
sunny sunny
normal normal
false false
% %
J. Frnkranz
Decision Tree Learning
tomorrow
mild
sunny
normal
false
J. Frnkranz
Di5ide-And-(on8uer A!gorit3%s
Fa%i!# of decision tree !earning a!gorit3%s
TDIDT: Top-Down Induction of Decision Trees di5ide t3e pro !e% in su pro !e%s so!5e eac3 pro !e%
Learn trees in a Top-Down fas3ion:

&asic 'ivide-And-(on)uer Algorithm:

1. se!ect a test for root node (reate ranc3 for eac3 possi !e outco%e of t3e test 4. sp!it instances into su sets 9ne for eac3 ranc3 e6tending fro% t3e node 3. repeat recursi5e!# for eac3 ranc3$ using on!# instances t3at reac3 t3e ranc3 ). stop recursion for a ranc3 if a!! its instances 3a5e t3e sa%e c!ass
7 J. Frnkranz
ID3 A!gorit3%
Function ID3

*n+ut: "6a%p!e set S ,ut+ut: Decision Tree DT return a new !eaf and !a e! it wit3 c
If a!! e6a%p!es in S e!ong to t3e sa%e c!ass c
"!se
i. 2e!ect an attri ute A according to so%e 3euristic function ii. &enerate a new node DT wit3 A as test iii.For eac3 ,a!ue vi of A ;a< Let Si ! a!! e6a%p!es in S wit3 A ! vi ; < =se ID3 to construct a decision tree DTi for e6a%p!e set Si ;c< &enerate an edge t3at connects DT and DTi
: J. Frnkranz
A Different Decision Tree
a!so e6p!ains a!! of t3e training data wi!! it genera!ize we!! to new data?
> J. Frnkranz
03ic3 attri ute to se!ect as t3e root?
J. Frnkranz
03at is a good Attri ute?
0e want to grow a si%p!e tree B a good attri ute prefers attri utes t3at sp!it t3e data so t3at eac3 successor node is as pure as posssi !e
i.e.$ t3e distri ution of e6a%p!es in eac3 node is so t3at it %ost!# contains e6a%p!es of a sing!e c!ass 0e want a %easure t3at prefers attri utes t3at 3a5e a 3ig3 degree of CorderD: -a6i%u% order: A!! e6a%p!es are of t3e sa%e c!ass -ini%u% order: A!! c!asses are e8ua!!# !ike!# Anot3er interpretation: "ntrop# is t3e a%ount of infor%ation t3at is contained a!! e6a%p!es of t3e sa%e c!ass B no infor%ation
1A J. Frnkranz
In ot3er words:
B "ntrop# is a %easure for ;un-<orderedness
"ntrop# ;for two c!asses<

S is a set of e6a%p!es p is t3e proportion of e6a%p!es in c!ass p ! " # p is t3e proportion of e6a%p!es in c!ass
%a6i%a! 5a!ue at e8ua! c!ass distri ution
"ntrop#:
E S = plog $ p plog $ p
Interpretation:
a%ount of unorderedness in t3e c!ass distri ution of S

11
%ini%a! 5a!ue if on!# one c!ass !eft in S

J. Frnkranz
"6a%p!e: Attri ute %utloo&
%utloo& ! sunny:
3 e6a%p!es yes$ 4 e6a%p!es no
$ $ ( ( E %utloo& =sunny = log log = *.+," ) ) ) )
%utloo& ! o'ercast: ) e6a%p!es yes$ A e6a%p!es no

E %utloo& =o'ercast =" log " * log * = *
Note: t3is is nor%a!!# undefined. Eere: F A
%utloo& ! rainy :
4 e6a%p!es yes$ 3 e6a%p!es no
( ( $ $ E %utloo& = rainy = log log =*.+," ) ) ) )

14 J. Frnkranz
"ntrop# ;for %ore c!asses<
"ntrop# can e easi!# genera!ized for n - $ c!asses
pi is t3e proportion of e6a%p!es in 2 t3at e!ong to t3e i-t3 c!ass

E S = p" log p " p$ log p $ ... p n log pn =i =" pi log p i
n
13
J. Frnkranz
A5erage "ntrop# G Infor%ation
-ro.lem:
"ntrop# on!# co%putes t3e 8ua!it# of a sing!e ;su -<set of e6a%p!es corresponds to a sing!e 5a!ue Eow can we co%pute t3e 8ua!it# of t3e entire sp!it? corresponds to an entire attri ute (o%pute t3e weig3ted a5erage o5er a!! sets resu!ting fro% t3e sp!it weig3ted # t3eir size
/olution:
I S , A =
S i
E0am+le:
E S i S
A5erage entrop# for attri ute Outlook:

) . ) I %utloo& = ". *.+," ". * ". *.+," =*./+(
1) J. Frnkranz
Infor%ation &ain
03en an attri ute A sp!its t3e set S into su sets Si

we co%pute t3e a5erage entrop# and co%pare t3e su% to t3e entrop# of t3e origina! set S
Infor%ation &ain for Attri ute A

Gain S , A = E S I S , A = E S
i
S i
E S i S
T3e attri ute t3at %a6i%izes t3e difference is se!ected
i.e.$ t3e attri ute t3at reduces t3e unorderedness %ostH
Note:
%a6i%izing infor%ation gain is e8ui5a!ent to %ini%izing a5erage entrop#$ ecause E0S1 is constant for a!! attri utes A
1* J. Frnkranz
"6a%p!e
Ga in S , Outlook =*.$./
17
Gain S , Temperature = *.*$+

J. Frnkranz
"6a%p!e ;(td.<
Outlook is se!ected as t3e root note
furt3er sp!itting necessar#
Outlook = overcast contains on!# e6a%p!es of c!ass yes

1: J. Frnkranz
"6a%p!e ;(td.<
Gain(Temperature ) Gain(Humidity ) Gain(Windy )
= 0.571 bits = 0.971 bits = 0.020 bits

1>
Humidity is se!ected
J. Frnkranz
"6a%p!e ;(td.<
Humidity is se!ected
furt3er sp!itting necessar# -ure leaves B +o furt3er e6pansion necessar#

1@ J. Frnkranz
Fina! decision tree
4A
J. Frnkranz
.roperties of "ntrop#
"ntrop# is t3e on!# function t3at satisfies a!! of t3e fo!!owing t3ree properties

03en node is pure$ %easure s3ou!d e zero 03en i%purit# is %a6i%a! ;i.e. a!! c!asses e8ua!!# !ike!#<$ %easure s3ou!d e %a6i%a! -easure s3ou!d o e# multistage property: p, q, r are c!asses in set S$ and T are e6a%p!es of c!ass t = q r
T E p , q , r S = E p , t S E q , r T S
B decisions can e %ade in se5era! stages
2i%p!ification of co%putation of a5erage entrop# ;infor%ation<:

$ ( ( . . I S , [ $,( ,. ]= $ log log log + + + + + +
= " log $ (log ( .log . +log + + $

41 J. Frnkranz
Eig3!#- ranc3ing attri utes
.ro !e%atic: attri utes wit3 a !arge nu% er of 5a!ues e6tre%e case: eac3 e6a%p!e 3as its own 5a!ue e.g. e6a%p!e IDI 2ay attri ute in weat3er data 2u sets are %ore !ike!# to e pure if t3ere is a !arge nu% er of different attri ute 5a!ues Infor%ation gain is iased towards c3oosing attri utes wit3 a !arge nu% er of 5a!ues T3is %a# cause se5era! pro !e%s: Overfitting se!ection of an attri ute t3at is non-opti%a! for prediction Fragmentation
data are frag%ented into ;too< %an# s%a!! sets

44 J. Frnkranz
Decision Tree for 2ay attri ute

2ay
*,3*) *,3*/ *,3*, *,3$/ *,3(*
"ntrop# of sp!it:
I 2ay = ". E [ *," ] E [ *," ]... E [ *," ] = *
"
Infor%ation gain is %a6i%a! for Da# ;*.+.* its<
43
J. Frnkranz
Intrinsic Infor%ation of an Attri ute
Intrinsic infor%ation of a sp!it

entrop# of distri ution of instances into ranc3es i.e. 3ow %uc3 infor%ation do we need to te!! w3ic3 ranc3 an instance e!ongs to
IntI S , A =
i
S i
S
log

S i
S
"6a%p!e:
Intrinsic infor%ation of 2ay attri ute:
IntI 2ay =". ".log ". = (.4*,

" "
9 ser5ation:
Attri utes wit3 3ig3er intrinsic infor%ation are !ess usefu!

4) J. Frnkranz
&ain 'atio
%odification of t3e infor%ation gain t3at reduces its ias towards %u!ti-5a!ued attri utes takes nu% er and size of ranc3es into account w3en c3oosing an attri ute
corrects t3e infor%ation gain # taking t3e intrinsic information of a sp!it into account
Definition of &ain 'atio:

Gain S , A GR S , A = IntI S , A
"6a%p!e:
&ain 'atio of 2ay attri ute

GR 2ay = *.+.* =*.$./ (,4*,
4* J. Frnkranz
&ain ratios for weat3er data

$#tloo% Info: Gain: 0.940-0.693 Split info: info([5,4,5]) Gain ratio: 0.247/1.577 *#"i(it) Info: Gain: 0.940-0.7&& Split info: info([7,7]) Gain ratio: 0.152/1
!"p!rat#r! 0.693 0.247 1.577 0.157 0.7&& 0.152 1.000 0.152 Info: Gain: 0.940-0.911 Split info: info([4,6,4]) Gain ratio: 0.029/1.557 'in() Info: Gain: 0.940-0.&92 Split info: info([&,6]) Gain ratio: 0.04&/0.9&5 0.&92 0.04& 0.9&5 0.049 0.911 0.029 1.557 0.019
2ay attri ute wou!d sti!! win...
one 3as to e carefu! w3ic3 attri utes to add...
+e5ert3e!ess: &ain ratio is %ore re!ia !e t3an Infor%ation &ain

47 J. Frnkranz
&ini Inde6

-an# a!ternati5e %easures to Infor%ation &ain -ost popu!ar a!ter%ati5e: &ini inde6

used in e.g.$ in (A'T ;(!assification And 'egression Trees< i%purit# %easure ;instead of entrop#<
Gini S =" pi
i
a5erage &ini inde6 ;instead of a5erage entrop# G infor%ation<
Gini S , A =
i
S iGini S
S
i
&ini &ain cou!d e defined ana!ogous!# to infor%ation gain ut t#pica!!# a5g. &ini inde6 is %ini%ized instead of %a6i%izing &ini gain
4: J. Frnkranz
(o%parison a%ong 2p!itting (riteria

For a 4-c!ass pro !e%:
4>
J. Frnkranz
Industria!-strengt3 a!gorit3%s
For an a!gorit3% to e usefu! in a wide range of rea!-wor!d app!ications it %ust:

.er%it nu%eric attri utes A!!ow %issing 5a!ues Je ro ust in t3e presence of noise Je a !e to appro6i%ate ar itrar# concept descriptions ;at !east in princip!e<
B ID3 needs to e e6tended to e a !e to dea! wit3 rea!-wor!d data
'esu!t: (#15 Jest-known and ;pro a !#< %ost wide!#-used !earning a!gorit3
origina! (-i%p!e%entation at 3ttp:GGwww.ru!e8uest.co%G.ersona!G 'e-i%p!e%entation of ().* 'e!ease > in 0eka: J).>
(o%%ercia! successor: (*.A

4@ J. Frnkranz
+u%eric attri utes
2tandard %et3od: inar# sp!its
".g. te%p K )*
=n!ike no%ina! attri utes$ e5er# attri ute 3as %an# possi !e sp!it points 2o!ution is straig3tforward e6tension:
"5a!uate info gain ;or ot3er %easure< for e5er# possi !e sp!it point of attri ute (3oose D estL sp!it point Info gain for est sp!it point is info gain for attri ute
(o%putationa!!# %ore de%anding
3A
J. Frnkranz
"6a%p!e

Assu%e a nu%erica! attri ute for Te%perature First step:

2ort a!! e6a%p!es according to t3e 5a!ue of t3is attri ute (ou!d !ook !ike t3is:
64 Yes 65 No 6& Yes 69 Yes 70 Yes 71 No 72 No 72 Yes 75 Yes 75 Yes &0 No &1 Yes &3 Yes &5 No
9ne sp!it etween eac3 pair of 5a!ues
".g.
Temperature 5 ,".)6 yes7., no7$ Temperature 8 ,".)6 yes7), no7(

/ 4 E Temperature ,".) E Temperature ,".) =*.+(+ ". ".
I Temperature 9 ,".) =
2p!it points can e p!aced etween 5a!ues or direct!# at 5a!ues
31
J. Frnkranz
"fficient (o%putation
"fficient co%putation needs on!# one scan t3roug3 t3e 5a!uesH
Linear!# scan t3e sorted 5a!ues$ eac3 ti%e updating t3e count %atri6 and co%puting t3e e5a!uation %easure (3oose t3e sp!it position t3at 3as t3e est 5a!ue
(heat
No
No
No
2es
2es
2es
No
No
No
No
3a0a.le *ncome
/orted 9alues /+lit -ositions

2es No 8ini 0 0 56
60 55 7 $ 7 65 56 0 !
70 7" 7 $ 6 56 0 "
75 40 7 $ 5 56 0 $
45 47 7 $ # 56 ! $ 7 " #
0 " 56 " $ 7 ! #
5 7 56 $ $
!00 !!0 7 0 # 56 $ #
!"0 !"" 56 $ 5
!"5 !7" 56 $ 6 7 0 !
""0 "$0 56 $ 7 M 0 0
7 0 $
7 0 "
01#"0
01#00
01$75
01$#$
01#!7
01#00
0.300
01$#$
01$75
01#00
01#"0
34
J. Frnkranz
Jinar# vs. -u!tiwa# 2p!its
2p!itting ;%u!ti-wa#< on a no%ina! attri ute e63austs a!! infor%ation in t3at attri ute
+o%ina! attri ute is tested ;at %ost< once on an# pat3 in t3e tree +u%eric attri ute %a# e tested se5era! ti%es a!ong a pat3 in t3e tree
+ot so for inar# sp!its on nu%eric attri utesH
Disad5antage: tree is 3ard to read 'e%ed#:

pre-discretize nu%eric attri utes ;B discretization<$ or use %u!ti-wa# sp!its instead of inar# ones can$ e.g.$ e co%puted # ui!ding a su tree using a sing!e nu%erica! attri ute. su tree can e f!attened into a %u!tiwa# sp!it ot3er %et3ods possi !e ;d#na%ic progra%%ing$ greed#...<
33 J. Frnkranz
-issing 5a!ues
"6a%p!es are c!assified as usua!
if we are !uck#$ attri utes wit3 %issing 5a!ues are not tested # t3e tree sp!it t3e instance into fractiona! instances ;pieces< one piece for eac3 outgoing ranc3 of t3e node a piece going down a ranc3 recei5es a weig3t proportiona! to t3e popu!arit# of t3e ranc3 weig3ts su% to 1 use su%s of weig3ts instead of counts -erge pro a i!it# distri ution using weig3ts of fractiona! instances
3) J. Frnkranz
If an attri ute wit3 a %issing 5a!ue needs to e tested:

Info gain or gain ratio work wit3 fractiona! instances
during c!assification$ sp!it t3e instance in t3e sa%e wa#
95erfitting and .runing
T3e s%a!!er t3e co%p!e6it# of a concept$ t3e !ess danger t3at it o5erfits t3e data
A po!#no%ia! of degree n can a!wa#s fit n+1 points +ote a CperfectD fit on t3e training data can a!wa#s e found for a decision treeH ;e6cept w3en data are contradictor#<
T3us$ !earning a!gorit3%s tr# to keep t3e !earned concepts si%p!e
-re--runing:
stop growing a ranc3 w3en infor%ation eco%es unre!ia !e grow a decision tree t3at correct!# c!assifies a!! training data si%p!if# it !ater # rep!acing so%e nodes wit3 !eafs
-ost--runing:

.ostpruning preferred in practiceNprepruning can Dstop ear!#L

3* J. Frnkranz
.repruning
Jased on statistica! significance test
2top growing t3e tree w3en t3ere is no statistically significant association etween an# attri ute and t3e c!ass at a particu!ar node
-ost popu!ar test: chi-squared test ID3 used c3i-s8uared test in addition to infor%ation gain
9n!# statistica!!# significant attri utes were a!!owed to e se!ected # infor%ation gain procedure
37
J. Frnkranz
"ar!# stopping
.re-pruning %a# stop t3e growt3 process pre%ature!#: early stopping (!assic e6a%p!e: O9'G.arit#-pro !e%
a 1 2 3 0 0 1
b 0 1 0
class 0 1 1 0
+o indi5idua! attri ute e63i its an# 4 1 1 significant association to t3e c!ass B In a dataset t3at contains O9' attri utes a and $ and se5era! irre!e5ant ;e.g.$ rando%< attri utes$ ID3 can not distinguis3 etween re!e5ant and irre!e5ant attri utes B .repruning wonPt e6pand t3e root node 2tructure is on!# 5isi !e in fu!!# e6panded tree
Jut:

O9'-t#pe pro !e%s rare in practice prepruning is faster t3an postpruning

3: J. Frnkranz
.ost-.runing
asic idea
first grow a fu!! tree to capture a!! possi !e attri ute interactions !ater re%o5e t3ose t3at are due to c3ance
1.learn a complete and consistent decision tree that classifies all examples in the training set correctly 2.as long as the performance increases

try simplification operators on the tree evaluate the resulting trees make the replacement the results in the best estimated performance
3.return the resulting decision tree
3>
J. Frnkranz
.ostpruning
Two su tree si%p!ification operators

2u tree rep!ace%ent 2u tree raising error esti%ation on separate pruning set ;Creduced error pruningD< wit3 confidence inter5a!s ;().*Qs %et3od< significance testing -DL princip!e
.ossi !e perfor%ance e5a!uation strategies
3@
J. Frnkranz
2u tree rep!ace%ent

Jotto%-up (onsider rep!acing a tree on!# after considering a!! its su trees
)A
J. Frnkranz
2u tree raising

De!ete node J 'edistri ute instances of !ea5es ) and * into (
)1
J. Frnkranz
"sti%ating "rror 'ates
.rune on!# if it does not increase t3e esti%ated error
"rror on t3e training data is +9T a usefu! esti%ator ;wou!d resu!t in a!%ost no pruning< =se 3o!d-out set for pruning "ssentia!!# t3e sa%e as in ru!e !earning on!# pruning operators differ ;su tree rep!ace%ent< Deri5e confidence inter5a! fro% training data wit3 a user-pro5ided confidence !e5e! Assu%e t3at t3e true error is on t3e upper ound of t3is confidence inter5a! ;pessi%istic error esti%ate<
'educed "rror .runing

().*Ps %et3od
)4
J. Frnkranz
.essi%istic "rror 'ates
(onsider c!assif#ing " e6a%p!es incorrect!# out of + e6a%p!es as o ser5ing " e5ents in + tria!s in t3e ino%ia! distri ution. For a gi5en confidence !e5e! (F$ t3e upper !i%it on t3e error rate o5er t3e w3o!e popu!ation is !" 0 E , N 1 wit3 (FR confidence. "6a%p!e:

1AA e6a%p!es in a !eaf 7 e6a%p!es %isc!assified Eow !arge is t3e true error assu%ing a pessi%istic esti%ate wit3 a confidence of 4*R? t3is is on!# a 3euristicH ut one t3at works we!!
)3
<ossibility0=1 ,)= confidence inter'al
+ote:

$
;*.$)0"**,/1
"*
:*.$)0"**,/1
J. Frnkranz
().*Ps %et3od
.essi%istic error esti%ate for a node
e=
$ #
#$ $N
$ N
$$ N
#$ .N $
"
#$ N
# is deri5ed fro% t3e desired confidence 5a!ue If c ! $)= t3en # ! *./+ ;fro% nor%a! distri ution< $ is t3e error on t3e training data N is t3e nu% er of instances co5ered # t3e !eaf
"rror esti%ate for su tree is weig3ted su% of error esti%ates for a!! its !ea5es BA node is pruned if error esti%ate of su tree is !ower t3an error esti%ate of t3e node
))
J. Frnkranz
"6a%p!e
f = 5/14 e = 0.46 e < 0.51 so prune!
f=0.33 e=0.47
f=0.5 e=0.72
f=0.33 e=0.47
Com ine! using ra"ios 6#2#6 gi$es 0.51

)* J. Frnkranz
'educed "rror .runing
asic idea
opti%ize t3e accurac# of a decision tree on a separate pruning set
1.split training data into a growing and a pruning set 2.learn a complete and consistent decision tree that classifies all examples in the growing set correctly 3.as long as the error on the pruning set does not increase

try to replace each node by a leaf (predicting the majority class) evaluate the resulting (sub-)tree on the pruning set make the replacement the results in the maximum error reduction
4.return the resulting decision tree
)7
J. Frnkranz
(o%p!e6it# of tree induction
Assu%e

m attri utes n training instances tree dept3 O ;!og n<
Jui!ding a tree 2u tree rep!ace%ent 2u tree raising
O ;m n !og n< O ;n< O ;n ;!og n<4<
"5er# instance %a# 3a5e to e redistri uted at e5er# node etween its !eaf and t3e root (ost for redistri ution ;on a5erage<: O ;!og n<
Tota! cost: O ;m n !og n< S O ;n ;!og n<4<

): J. Frnkranz
Fro% trees to ru!es

2i%p!e wa#: one ru!e for eac3 !eaf ().*ru!es: greedi!# prune conditions fro% eac3 ru!e if t3is reduces its esti%ated error

(an produce dup!icate ru!es (3eck for t3is at t3e end !ook at eac3 c!ass in turn consider t3e ru!es for t3at c!ass find a DgoodL su set ;guided # -DL<
T3en

T3en rank t3e su sets to a5oid conf!icts Fina!!#$ re%o5e ru!es ;greedi!#< if t3is decreases error on t3e training data
)>
J. Frnkranz
Decision Lists and Decision &rap3s
Decision Lists

An ordered !ist of ru!es t3e first ru!e t3at fires %akes t3e prediction can e !earned wit3 a co5ering approac3 2i%i!ar to decision trees$ ut nodes %a# 3a5e %u!tip!e predecessors DA&s: Directed$ ac#c!ic grap3s t3ere are a few a!gorit3%s t3at can !earn DA&s !earn %uc3 s%a!!er structures ut in genera! not 5er# successfu! a decision !ist %a# e 5iewed as a specia! case of a DA&
)@ J. Frnkranz
Decision &rap3s
2pecia! case:
"6a%p!e
A decision !ist for a ru!e set wit3 ru!es

wit3 )$ 4$ 4$ 1 conditions$ respecti5e!# drawn as a decision grap3
>ule $ >ule "

*A
>ule (
>ule .
J. Frnkranz
().*: c3oices and options

().*ru!es s!ow for !arge and nois# datasets (o%%ercia! 5ersion (*.Aru!es uses a different tec3ni8ue
-uc3 faster and a it %ore accurate
().* 3as se5era! para%eters
-c (onfidence 5a!ue ;defau!t 4*R<: !ower 5a!ues incur 3ea5ier pruning -m -ini%u% nu% er of instances in t3e two %ost popu!ar ranc3es ;defau!t 4< 9t3ers for$ e.g.$ 3a5ing on!# two-wa# sp!its ;a!so on s#% o!ic attri utes<$ etc.
*1
J. Frnkranz
2a%p!e "6peri%enta! "5a!uation

T#pica! e3a5ior wit3 growing m and decreasing c tree size and training accurac# ;F purit#< a!wa#s decrease predicti5e accurac# first increases ;o5erfitting a5oidance< t3en decreases ;o5er-genera!ization< idea! 5a!ue on this data set near m ! (* c ! "*
*4 J. Frnkranz
'u!es 5s. Trees

"ac3 decision tree can e con5erted into a ru!e set B 'u!e sets are at !east as e6pressi5e as decision trees
a decision tree can e 5iewed as a set of non-o5er!apping ru!es t#pica!!# !earned 5ia divide-and-conquer a!gorit3%s ;recursi5e partitioning< -an# concepts 3a5e a s3orter description as a ru!e set !ow co%p!e6it# decision !ists are %ore e6pressi5e t3an !ow co%p!e6it# decision trees ;'i5est$ 1@>:< e6ceptions: if one or %ore attri utes are re!e5ant for t3e c!assification of all e6a%p!es ;e.g.$ parit#< 2eparate-and-(on8uer 5s. Di5ide-and-(on8uer
*3 J. Frnkranz
Transfor%ation of ru!e sets G decision !ists into trees is !ess tri5ia!

Learning strategies:
Discussion TDIDT
T3e %ost e6tensi5e!# studied %et3od of %ac3ine !earning used in data %ining Different criteria for attri uteGtest se!ection rare!# %ake a !arge difference Different pruning %et3ods %ain!# c3ange t3e size of t3e resu!ting pruned tree ().* ui!ds uni5ariate decision trees 2o%e TDITDT s#ste%s can ui!d %u!ti5ariate trees ;e.g. (A'T<
%u!ti-5ariate: a sp!it is not ased on a sing!e attri ute ut on a function defined on %u!tip!e attri utes
*)
J. Frnkranz
'egression .ro !e%s
'egression Task
t3e target 5aria !e is nu%erica! instead of discrete
Two principa! approac3es
Discretize t3e nu%erica! target 5aria !e
e.g.$ e8ua!-wit3 inter5a!s$ or e8ua!-fre8uenc#
and use a c!assification !earning a!gorit3% Adapt t3e c!assification a!gorit3% to regression data B 'egression Trees and -ode! Trees
**
J. Frnkranz
'egression Trees
Differences to Decision Trees ;(!assification Trees<
Leaf +odes: .redict t3e a5erage 5a!ue of a!! instances in t3is !eaf 2p!itting criterion: -ini%ize t3e 5ariance of t3e 5a!ues in eac3 su set Si 2tandard de5iation reduction S i SDR A , S = SD S SD S i i S Ter%ination criteria: ,er# i%portantH ;ot3erwise on!# sing!e points in eac3 !eaf< !ower ound on standard de5iation in a node !ower ound on nu% er of e6a%p!es in a node .runing criterion: +u%eric error %easures$ e.g. -ean-28uared "rror
*7 J. Frnkranz
-ode! Trees
In a Leaf node

(!assification Trees predict a c!ass 5a!ue 'egression Trees predict t3e a5erage 5a!ue of a!! instances in t3e %ode! -ode! Trees use a !inear %ode! for %aking t3e predictions growing of t3e tree is as wit3 'egression Trees
Linear -ode!: '( % = & i v i % w3ere vi0%1 is t3e 5a!ue of attri ute Ai
i for e6a%p!e % and &i is a weig3t T3e attri utes t3at 3a5e een used in t3e pat3 of t3e tree can e ignored
0eig3ts can e fitted wit3 standard %at3 packages $ -ini%ize t3e -ean 28uared "rror (SE = * r ) )
)
*: J. Frnkranz
2u%%ar#
(!assification .ro !e%s re8uire t3e prediction of a discrete target 5a!ue

can e so!5ed using decision tree !earning iterati5e!# se!ect t3e est attri ute and sp!it up t3e 5a!ues according to t3is attri ute
'egression .ro !e%s re8uire t3e prediction of a nu%erica! target 5a!ue

can e so!5ed wit3 regression trees and %ode! trees difference is in t3e %ode!s t3at are used at t3e !eafs are grown !ike decision trees$ ut wit3 different sp!itting criteria si%p!er$ see%ing!# !ess accurate trees are often prefera !e e5a!uation 3as to e done on separate test sets
*> J. Frnkranz
95erfitting is a serious pro !e%H

Decision Tree

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Decision Tree

Caricato da

Copyright:

Formati disponibili

Decision-Tree Learning

'egression and -ode! Trees

a decision tree consists of

Decision Tree Learning

Decision Tree Learning

Fa%i!# of decision tree !earning a!gorit3%s

Learn trees in a Top-Down fas3ion:

&asic 'ivide-And-(on)uer Algorithm:

If a!! e6a%p!es in S e!ong to t3e sa%e c!ass c

A Different Decision Tree

03ic3 attri ute to se!ect as t3e root?

03at is a good Attri ute?

B "ntrop# is a %easure for ;un-<orderedness

"ntrop# ;for two c!asses<

%a6i%a! 5a!ue at e8ua! c!ass distri ution

a%ount of unorderedness in t3e c!ass distri ution of S

%ini%a! 5a!ue if on!# one c!ass !eft in S

"6a%p!e: Attri ute %utloo&

3 e6a%p!es yes$ 4 e6a%p!es no

$ $ ( ( E %utloo& =sunny = log log = *.+," ) ) ) )

%utloo& ! o'ercast: ) e6a%p!es yes$ A e6a%p!es no

4 e6a%p!es yes$ 3 e6a%p!es no

( ( $ $ E %utloo& = rainy = log log =*.+," ) ) ) )

"ntrop# ;for %ore c!asses<

"ntrop# can e easi!# genera!ized for n - $ c!asses

pi is t3e proportion of e6a%p!es in 2 t3at e!ong to t3e i-t3 c!ass

A5erage "ntrop# G Infor%ation

A5erage entrop# for attri ute Outlook:

03en an attri ute A sp!its t3e set S into su sets Si

Infor%ation &ain for Attri ute A

T3e attri ute t3at %a6i%izes t3e difference is se!ected

i.e.$ t3e attri ute t3at reduces t3e unorderedness %ostH

Gain S , Temperature = *.*$+

furt3er sp!itting necessar#

Outlook = overcast contains on!# e6a%p!es of c!ass yes

Gain(Temperature ) Gain(Humidity ) Gain(Windy )

= 0.571 bits = 0.971 bits = 0.020 bits

furt3er sp!itting necessar# -ure leaves B +o furt3er e6pansion necessar#

Fina! decision tree

2i%p!ification of co%putation of a5erage entrop# ;infor%ation<:

= " log $ (log ( .log . +log + + $

Eig3!#- ranc3ing attri utes

data are frag%ented into ;too< %an# s%a!! sets

Decision Tree for 2ay attri ute

Infor%ation gain is %a6i%a! for Da# ;*.+.* its<

Intrinsic Infor%ation of an Attri ute

Intrinsic infor%ation of a sp!it

Intrinsic infor%ation of 2ay attri ute:

IntI 2ay =". ".log ". = (.4*,

Attri utes wit3 3ig3er intrinsic infor%ation are !ess usefu!

Definition of &ain 'atio:

&ain 'atio of 2ay attri ute

&ain ratios for weat3er data

2ay attri ute wou!d sti!! win...

one 3as to e carefu! w3ic3 attri utes to add...

+e5ert3e!ess: &ain ratio is %ore re!ia !e t3an Infor%ation &ain

a5erage &ini inde6 ;instead of a5erage entrop# G infor%ation<

(o%parison a%ong 2p!itting (riteria

For an a!gorit3% to e usefu! in a wide range of rea!-wor!d app!ications it %ust:

B ID3 needs to e e6tended to e a !e to dea! wit3 rea!-wor!d data

origina! (-i%p!e%entation at 3ttp:GGwww.ru!e8uest.co%G.ersona!G 'e-i%p!e%entation of ().* 'e!ease > in 0eka: J).>

(o%%ercia! successor: (*.A

+u%eric attri utes

2tandard %et3od: inar# sp!its

Gain S , Temperature = .$+

Infor%ation gain is %a6i%a! for Da# ;.+. its<