Decision Tree

ChapterThree
Decision Tree
Copyright 2012 Pearson Education, Inc.
Overview
0-2
Decision tree induction is a simple but powerful

learning paradigm. In this method a set of
training examples is broken down into smaller
and smaller subsets while at the same time an
associated decision tree get incrementally
developed. At the end of the learning process, a
decision tree covering the training set is
returned.
The decision tree can be thought of as a set
sentences (in Disjunctive Normal Form) written
propositional logic.
0-3
At a basic level, machine learning is about

predicting the future based on the past.
For instance, you might wish to predict
how much a user Alice will like a movie
that she hasnt seen, based on her ratings
of movies that she has seen. This means
making informed guesses about some
unobserved property of some object,
based on observed properties of that
object.
0-4
Imagine you only ever do four things at the weekend: go

shopping, watch a movie, play tennis or just stay in. What you
do depends on three things: the weather (windy, rainy or
sunny); how much money you have (rich or poor) and whether
your parents are visiting. You say to your yourself: if my
parents are visiting, we'll go to the cinema. If they're not
visiting and it's sunny, then I'll play tennis, but if it's windy, and
I'm rich, then I'll go shopping. If they're not visiting, it's windy
and I'm poor, then I will go to the cinema. If they're not visiting
and it's rainy, then I'll stay in.
To remember all this, you draw a flowchart which will enable
you to read off your decision. We call such diagrams decision
trees. A suitable decision tree for the weekend decision
choices would be as follows:
0-5
0-6
We can see why such diagrams are called trees, because, while they
are admittedly upside down, they start from a root and have branches
leading to leaves (the tips of the graph at the bottom). Note that the
leaves are always decisions, and a particular decision might be at the
end of multiple branches (for example, we could choose to go to the
cinema for two different reasons).
According to our decision tree diagram, on Saturday morning, when
we wake up, all we need to do is check (a) the weather (b) how much
money we have and (c) whether our parent's car is parked in the
drive. The decision tree will then enable us to make our decision.
Suppose, for example, that the parents haven't turned up and the sun
is shining. Then this path through our decision tree will tell us what to
do:
0-7
0-8
Hence we run off to play tennis because our

decision tree told us to. Note that the decision tree
covers all eventualities. That is, there are no
values that the weather, the parents turning up or
the money situation could take which aren't
catered for in the decision tree. Note that, in this
lecture, we will be looking at how to automatically
generate decision trees from examples, not at how
to turn thought processes into decision trees.
0-9
The basic idea

In the decision tree above, it is significant
that the "parents visiting" node came at the
top of the tree. We don't know exactly the
reason for this, as we didn't see the
example weekends from which the tree
was produced.
0-10
However, it is likely that the number of weekends the

parents visited was relatively high, and every
weekend they did visit, there was a trip to the cinema.
Suppose, for example, the parents have visited every
fortnight for a year, and on each occasion the family
visited the cinema. This means that there is no
evidence in favour of doing anything other than
watching a film when the parents visit. Given that we
are learning rules from examples, this means that if
the parents visit, the decision is already made.
0-11
Hence we can put this at the top of the

decision tree, and disregard all the
examples where the parents visited when
constructing the rest of the tree. Not
having to worry about a set of examples
will make the construction job easier.
0-12
This kind of thinking underlies the ID3

algorithm for learning decisions trees,
which we will describe more formally
below.
0-13
The Basic DTL Algorithm

Top-down, greedy search through the space of
possible decision trees (ID3 and C4.5)
Root: best attribute for classification
Which attribute is the best classifier?
answer based on information gain
0-14
Entropy
Putting together a decision tree is all a matter of
choosing which attribute to test at each node in
the tree.
We shall define a measure called information
gain which will be used to decide which attribute
to test at each node.
Information gain is itself calculated using a
measure called entropy.
0-15
Given a binary categorisation, C, and a set

of examples, S, for which the proportion of
examples categorised as positive by C is
p+ and the proportion of examples
categorised as negative by C is p -, then
the entropy of S is:
0-16
0-17
Information Gain
We now return to the problem of trying to
determine the best attribute to choose for a
particular node in a tree.
The following measure calculates a
numerical value for a given attribute, A,
with respect to a set of examples, S. Note
that the values of attribute A will range over
a set of possibilities which we call
Values(A),
0-18
and that, for a particular value from that

set, v, we write Sv for the set of examples
which have value v for attribute A.
The information gain of attribute A, relative
to a collection of examples, S, is calculated
as:
0-19
Decision Tree Learning

Day
Outlook
Temperature
Humidity
Wind
PlayTennis
D1
D2
D3
D4
D5
D6
D7
D8
D9
D10
Sunny
Sunny
Overcast
Rain
Rain
Rain
Overcast
Sunny
Sunny
Rain
Hot
Hot
Hot
Mild
Cool
Cool
Cool
Mild
Cool
Mild
High
High
High
High
Normal
Normal
Normal
High
Normal
Normal
Weak
Strong
Weak
Weak
Weak
Strong
Strong
Weak
Weak
Weak
No
No
Yes
Yes
Yes
No
Yes
No
Yes
Yes
D11
Sunny
Mild
Normal
Strong
Yes
D12
Overcast
Mild
High
Strong
Yes
D13
Overcast
Hot
Normal
Weak
Yes
D14
Rain
Mild
High
Strong
No
[See: Tom M. Mitchell, Machine Learning, McGraw-Hill, 1997]

(Outlook = Sunny Humidity = Normal) (Outlook = Overcast) (Outlook = Rain Wind = Weak)
[See: Tom M. Mitchell, Machine Learning, McGraw-Hill, 1997]

ID3
Building a Decision Tree

1.
2.
3.
4.
First test all attributes and select the on that would function as the best
root;
Break-up the training set into subsets based on the branches of the
root node;
Test the remaining attributes to see which ones fit best underneath the
branches of the root node;
Continue this process for all other branches until
a.
b.
c.
all examples of a subset are of one type

there are no examples left (return majority classification of the parent)
there are no more attributes left (default value should be majority
classification)

Determining which attribute is best (Entropy & Gain)
Entropy (E) is the minimum number of bits needed in order
to classify an arbitrary example as yes or no
E(S) = ci=1 pi log2 pi ,
Where S is a set of training examples,
c is the number of classes, and
pi is the proportion of the training set that is of class i
For our entropy equation 0 log2 0 = 0

The information gain G(S,A) where A is an attribute
G(S,A) E(S) - v in Values(A) (|Sv| / |S|) * E(Sv)

Lets Try an Example!
Play tennis={no, no, yes, yes, yes, no, yes, no, yes, yes,
yes, yes, yes, no}
The target function which named Play tennis contains
two classes:
C1=yes
C2=no
E([C1, C2]) represent that there are C1 positive training elements
and C2 negative elements.
Therefore the Entropy for the training data, E(S), can be

represented as E([9+,5-]) because of the 14 training
examples 9 of them are yes and 5 of them are no.
Decision Tree Learning:

A Simple Example
Lets start off by calculating the Entropy of the Training
Set.
9 5
5
9
E ( S ) pi log 2 pi
log 2 log 2 0.940
14 14
14
14
i 1
n
E(S) = E([9+,5-]) = (-9/14 log2 9/14) + (-5/14 log2 5/14)

= 0.94
Gain(S,) = ?
Gain(S,) = ?
Gain(S,) = ?
Gain(S,) = ?

A Simple Example
Next we will need to calculate the information gain G(S,A)
for each attribute A where A is taken from the set
{Outlook, Temperature, Humidity, Wind}.

A Simple Example
The information gain for Outlook is:
Outlook :
sunny 3 no overcast 0 no Rain 3no

sunny 2 yes overcast 4 yes Rain 2yes
G(S,Outlook) = E(S) [5/14 * E(Outlook=sunny) + 4/14 *

E(Outlook = overcast) + 5/14 * E(Outlook=rain)]
G(S,Outlook) = E([9+,5-]) [5/14*E(2+,3-) + 4/14*E([4+,0-]) +
5/14*E([3+,2-])]
G(S,Outlook) = 0.94 [5/14*0.971 + 4/14*0.0 + 5/14*0.971]
G(S,Outlook) = 0.246

A Simple Example
G(S,Temperature) = 0.94 [4/14*E(Temperature=hot) +
6/14*E(Temperature=mild) +
4/14*E(Temperature=cool)]
G(S,Temperature) = 0.94 [4/14*E([2+,2-]) +
6/14*E([4+,2-]) + 4/14*E([3+,1-])]
G(S,Temperature) = 0.94 [4/14 + 6/14*0.918 +
4/14*0.811]
G(S,Temperature) = 0.029

A Simple Example
G(S,Humidity) = 0.94 [7/14*E(Humidity=high) +
7/14*E(Humidity=normal)]
G(S,Humidity = 0.94 [7/14*E([3+,4-]) + 7/14*E([6+,1-])]
G(S,Humidity = 0.94 [7/14*0.985 + 7/14*0.592]
G(S,Humidity) = 0.1515

A Simple Example
G(S,Wind) = 0.94 [8/14*0.811 + 6/14*1.00]
G(S,Wind) = 0.048

A Simple Example
Outlook is our winner!

A Simple Example
Now that we have discovered the root of our decision tree
we must now recursively find the nodes that should go
below Sunny, Overcast, and Rain.

A Simple Example
G(Outlook=Rain, Humidity) = 0.971
[2/5*E(Outlook=Rain ^ Humidity=high) +
3/5*E(Outlook=Rain ^Humidity=normal]
G(Outlook=Rain, Humidity) = 0.02
G(Outlook=Rain,Wind) = 0.971- [3/5*0 + 2/5*0]
G(Outlook=Rain,Wind) = 0.971

A Simple Example
Now our decision tree looks like:

Decision Tree

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Decision Tree

Caricato da

Copyright:

Formati disponibili

ChapterThree

Copyright 2012 Pearson Education, Inc.

Copyright 2012 Pearson Education, Inc.

Decision tree induction is a simple but powerful

At a basic level, machine learning is about

Copyright 2012 Pearson Education, Inc.

Imagine you only ever do four things at the weekend: go

Copyright 2012 Pearson Education, Inc.

Copyright 2012 Pearson Education, Inc.

Copyright 2012 Pearson Education, Inc.

Hence we run off to play tennis because our

Copyright 2012 Pearson Education, Inc.

The basic idea

Copyright 2012 Pearson Education, Inc.

However, it is likely that the number of weekends the

Copyright 2012 Pearson Education, Inc.

Hence we can put this at the top of the

Copyright 2012 Pearson Education, Inc.

This kind of thinking underlies the ID3

Copyright 2012 Pearson Education, Inc.

The Basic DTL Algorithm

Copyright 2012 Pearson Education, Inc.

Copyright 2012 Pearson Education, Inc.

Given a binary categorisation, C, and a set

Copyright 2012 Pearson Education, Inc.

Copyright 2012 Pearson Education, Inc.

and that, for a particular value from that

Copyright 2012 Pearson Education, Inc.

Decision Tree Learning

[See: Tom M. Mitchell, Machine Learning, McGraw-Hill, 1997]

Decision Tree Learning

Decision Tree Learning

Building a Decision Tree

all examples of a subset are of one type

Copyright 2012 Pearson Education, Inc.

Decision Tree Learning

For our entropy equation 0 log2 0 = 0

Copyright 2012 Pearson Education, Inc.

Decision Tree Learning

Therefore the Entropy for the training data, E(S), can be

Copyright 2012 Pearson Education, Inc.

Decision Tree Learning:

E(S) = E([9+,5-]) = (-9/14 log2 9/14) + (-5/14 log2 5/14)

Decision Tree Learning:

Copyright 2012 Pearson Education, Inc.

Decision Tree Learning:

sunny 3 no overcast 0 no Rain 3no

G(S,Outlook) = E(S) [5/14 * E(Outlook=sunny) + 4/14 *

Copyright 2012 Pearson Education, Inc.

Decision Tree Learning:

Copyright 2012 Pearson Education, Inc.

Decision Tree Learning:

Copyright 2012 Pearson Education, Inc.

Decision Tree Learning:

Copyright 2012 Pearson Education, Inc.

Decision Tree Learning:

Copyright 2012 Pearson Education, Inc.

Decision Tree Learning:

Copyright 2012 Pearson Education, Inc.

Decision Tree Learning:

Copyright 2012 Pearson Education, Inc.

Decision Tree Learning:

Copyright 2012 Pearson Education, Inc.

Potrebbero piacerti anche