Sei sulla pagina 1di 34

ChapterThree

Decision Tree

Copyright 2012 Pearson Education, Inc.

Overview

Copyright 2012 Pearson Education, Inc.

0-2

Decision tree induction is a simple but powerful


learning paradigm. In this method a set of
training examples is broken down into smaller
and smaller subsets while at the same time an
associated decision tree get incrementally
developed. At the end of the learning process, a
decision tree covering the training set is
returned.
The decision tree can be thought of as a set
sentences (in Disjunctive Normal Form) written
propositional logic.
Copyright 2012 Pearson Education, Inc.

0-3

At a basic level, machine learning is about


predicting the future based on the past.
For instance, you might wish to predict
how much a user Alice will like a movie
that she hasnt seen, based on her ratings
of movies that she has seen. This means
making informed guesses about some
unobserved property of some object,
based on observed properties of that
object.

Copyright 2012 Pearson Education, Inc.

0-4

Imagine you only ever do four things at the weekend: go


shopping, watch a movie, play tennis or just stay in. What you
do depends on three things: the weather (windy, rainy or
sunny); how much money you have (rich or poor) and whether
your parents are visiting. You say to your yourself: if my
parents are visiting, we'll go to the cinema. If they're not
visiting and it's sunny, then I'll play tennis, but if it's windy, and
I'm rich, then I'll go shopping. If they're not visiting, it's windy
and I'm poor, then I will go to the cinema. If they're not visiting
and it's rainy, then I'll stay in.
To remember all this, you draw a flowchart which will enable
you to read off your decision. We call such diagrams decision
trees. A suitable decision tree for the weekend decision
choices would be as follows:
Copyright 2012 Pearson Education, Inc.

0-5

Copyright 2012 Pearson Education, Inc.

0-6

We can see why such diagrams are called trees, because, while they
are admittedly upside down, they start from a root and have branches
leading to leaves (the tips of the graph at the bottom). Note that the
leaves are always decisions, and a particular decision might be at the
end of multiple branches (for example, we could choose to go to the
cinema for two different reasons).
According to our decision tree diagram, on Saturday morning, when
we wake up, all we need to do is check (a) the weather (b) how much
money we have and (c) whether our parent's car is parked in the
drive. The decision tree will then enable us to make our decision.
Suppose, for example, that the parents haven't turned up and the sun
is shining. Then this path through our decision tree will tell us what to
do:

Copyright 2012 Pearson Education, Inc.

0-7

Copyright 2012 Pearson Education, Inc.

0-8

Hence we run off to play tennis because our


decision tree told us to. Note that the decision tree
covers all eventualities. That is, there are no
values that the weather, the parents turning up or
the money situation could take which aren't
catered for in the decision tree. Note that, in this
lecture, we will be looking at how to automatically
generate decision trees from examples, not at how
to turn thought processes into decision trees.

Copyright 2012 Pearson Education, Inc.

0-9

The basic idea


In the decision tree above, it is significant
that the "parents visiting" node came at the
top of the tree. We don't know exactly the
reason for this, as we didn't see the
example weekends from which the tree
was produced.

Copyright 2012 Pearson Education, Inc.

0-10

However, it is likely that the number of weekends the


parents visited was relatively high, and every
weekend they did visit, there was a trip to the cinema.
Suppose, for example, the parents have visited every
fortnight for a year, and on each occasion the family
visited the cinema. This means that there is no
evidence in favour of doing anything other than
watching a film when the parents visit. Given that we
are learning rules from examples, this means that if
the parents visit, the decision is already made.

Copyright 2012 Pearson Education, Inc.

0-11

Hence we can put this at the top of the


decision tree, and disregard all the
examples where the parents visited when
constructing the rest of the tree. Not
having to worry about a set of examples
will make the construction job easier.

Copyright 2012 Pearson Education, Inc.

0-12

This kind of thinking underlies the ID3


algorithm for learning decisions trees,
which we will describe more formally
below.

Copyright 2012 Pearson Education, Inc.

0-13

The Basic DTL Algorithm


Top-down, greedy search through the space of
possible decision trees (ID3 and C4.5)
Root: best attribute for classification
Which attribute is the best classifier?
answer based on information gain

Copyright 2012 Pearson Education, Inc.

0-14

Entropy
Putting together a decision tree is all a matter of
choosing which attribute to test at each node in
the tree.
We shall define a measure called information
gain which will be used to decide which attribute
to test at each node.
Information gain is itself calculated using a
measure called entropy.

Copyright 2012 Pearson Education, Inc.

0-15

Given a binary categorisation, C, and a set


of examples, S, for which the proportion of
examples categorised as positive by C is
p+ and the proportion of examples
categorised as negative by C is p -, then
the entropy of S is:

Copyright 2012 Pearson Education, Inc.

0-16

Copyright 2012 Pearson Education, Inc.

0-17

Information Gain
We now return to the problem of trying to
determine the best attribute to choose for a
particular node in a tree.
The following measure calculates a
numerical value for a given attribute, A,
with respect to a set of examples, S. Note
that the values of attribute A will range over
a set of possibilities which we call
Values(A),
Copyright 2012 Pearson Education, Inc.

0-18

and that, for a particular value from that


set, v, we write Sv for the set of examples
which have value v for attribute A.
The information gain of attribute A, relative
to a collection of examples, S, is calculated
as:

Copyright 2012 Pearson Education, Inc.

0-19

Decision Tree Learning


Day

Outlook

Temperature

Humidity

Wind

PlayTennis

D1
D2
D3
D4
D5
D6
D7
D8
D9
D10

Sunny
Sunny
Overcast
Rain
Rain
Rain
Overcast
Sunny
Sunny
Rain

Hot
Hot
Hot
Mild
Cool
Cool
Cool
Mild
Cool
Mild

High
High
High
High
Normal
Normal
Normal
High
Normal
Normal

Weak
Strong
Weak
Weak
Weak
Strong
Strong
Weak
Weak
Weak

No
No
Yes
Yes
Yes
No
Yes
No
Yes
Yes

D11

Sunny

Mild

Normal

Strong

Yes

D12

Overcast

Mild

High

Strong

Yes

D13

Overcast

Hot

Normal

Weak

Yes

D14

Rain

Mild

High

Strong

No

[See: Tom M. Mitchell, Machine Learning, McGraw-Hill, 1997]


Copyright 2012 Pearson Education, Inc.

Decision Tree Learning

(Outlook = Sunny Humidity = Normal) (Outlook = Overcast) (Outlook = Rain Wind = Weak)
[See: Tom M. Mitchell, Machine Learning, McGraw-Hill, 1997]
Copyright 2012 Pearson Education, Inc.

Decision Tree Learning


ID3

Building a Decision Tree


1.
2.
3.
4.

First test all attributes and select the on that would function as the best
root;
Break-up the training set into subsets based on the branches of the
root node;
Test the remaining attributes to see which ones fit best underneath the
branches of the root node;
Continue this process for all other branches until
a.
b.
c.

all examples of a subset are of one type


there are no examples left (return majority classification of the parent)
there are no more attributes left (default value should be majority
classification)

Copyright 2012 Pearson Education, Inc.

Decision Tree Learning


Determining which attribute is best (Entropy & Gain)
Entropy (E) is the minimum number of bits needed in order
to classify an arbitrary example as yes or no
E(S) = ci=1 pi log2 pi ,
Where S is a set of training examples,
c is the number of classes, and
pi is the proportion of the training set that is of class i

For our entropy equation 0 log2 0 = 0


The information gain G(S,A) where A is an attribute
G(S,A) E(S) - v in Values(A) (|Sv| / |S|) * E(Sv)

Copyright 2012 Pearson Education, Inc.

Decision Tree Learning


Lets Try an Example!
Play tennis={no, no, yes, yes, yes, no, yes, no, yes, yes,
yes, yes, yes, no}
The target function which named Play tennis contains
two classes:
C1=yes
C2=no
E([C1, C2]) represent that there are C1 positive training elements
and C2 negative elements.

Therefore the Entropy for the training data, E(S), can be


represented as E([9+,5-]) because of the 14 training
examples 9 of them are yes and 5 of them are no.

Copyright 2012 Pearson Education, Inc.

Decision Tree Learning:


A Simple Example
Lets start off by calculating the Entropy of the Training
Set.
9 5
5
9
E ( S ) pi log 2 pi
log 2 log 2 0.940
14 14
14
14
i 1
n

E(S) = E([9+,5-]) = (-9/14 log2 9/14) + (-5/14 log2 5/14)


= 0.94
Gain(S,) = ?
Gain(S,) = ?
Gain(S,) = ?
Gain(S,) = ?
Copyright 2012 Pearson Education, Inc.

Decision Tree Learning:


A Simple Example
Next we will need to calculate the information gain G(S,A)
for each attribute A where A is taken from the set
{Outlook, Temperature, Humidity, Wind}.

Copyright 2012 Pearson Education, Inc.

Decision Tree Learning:


A Simple Example
The information gain for Outlook is:

Outlook :

sunny 3 no overcast 0 no Rain 3no


sunny 2 yes overcast 4 yes Rain 2yes

G(S,Outlook) = E(S) [5/14 * E(Outlook=sunny) + 4/14 *


E(Outlook = overcast) + 5/14 * E(Outlook=rain)]
G(S,Outlook) = E([9+,5-]) [5/14*E(2+,3-) + 4/14*E([4+,0-]) +
5/14*E([3+,2-])]
G(S,Outlook) = 0.94 [5/14*0.971 + 4/14*0.0 + 5/14*0.971]
G(S,Outlook) = 0.246

Copyright 2012 Pearson Education, Inc.

Decision Tree Learning:


A Simple Example
G(S,Temperature) = 0.94 [4/14*E(Temperature=hot) +
6/14*E(Temperature=mild) +
4/14*E(Temperature=cool)]
G(S,Temperature) = 0.94 [4/14*E([2+,2-]) +
6/14*E([4+,2-]) + 4/14*E([3+,1-])]
G(S,Temperature) = 0.94 [4/14 + 6/14*0.918 +
4/14*0.811]
G(S,Temperature) = 0.029

Copyright 2012 Pearson Education, Inc.

Decision Tree Learning:


A Simple Example
G(S,Humidity) = 0.94 [7/14*E(Humidity=high) +
7/14*E(Humidity=normal)]
G(S,Humidity = 0.94 [7/14*E([3+,4-]) + 7/14*E([6+,1-])]
G(S,Humidity = 0.94 [7/14*0.985 + 7/14*0.592]
G(S,Humidity) = 0.1515

Copyright 2012 Pearson Education, Inc.

Decision Tree Learning:


A Simple Example
G(S,Wind) = 0.94 [8/14*0.811 + 6/14*1.00]
G(S,Wind) = 0.048

Copyright 2012 Pearson Education, Inc.

Decision Tree Learning:


A Simple Example
Outlook is our winner!

Copyright 2012 Pearson Education, Inc.

Decision Tree Learning:


A Simple Example
Now that we have discovered the root of our decision tree
we must now recursively find the nodes that should go
below Sunny, Overcast, and Rain.

Copyright 2012 Pearson Education, Inc.

Decision Tree Learning:


A Simple Example
G(Outlook=Rain, Humidity) = 0.971
[2/5*E(Outlook=Rain ^ Humidity=high) +
3/5*E(Outlook=Rain ^Humidity=normal]
G(Outlook=Rain, Humidity) = 0.02
G(Outlook=Rain,Wind) = 0.971- [3/5*0 + 2/5*0]
G(Outlook=Rain,Wind) = 0.971

Copyright 2012 Pearson Education, Inc.

Decision Tree Learning:


A Simple Example
Now our decision tree looks like:

Copyright 2012 Pearson Education, Inc.

Potrebbero piacerti anche