Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
windy
true false
outlook temp
sunny overcast rain hot mild cool
humidity good bad outlook outlook good
high normal sunny overcast rain s o r
bad good bad good ??? bad ??? good
Measuring Information
Information gain is a popular way to select an attribute.
Let I(p, n) be the information in p positive examples and
n negative examples.
p p n n
I(p, n) = − log − log
p+n 2p+n p+n 2p+n
Suppose there are pi positive and ni negative examples
for the ith value of an attribute. Then information gain
G(p1, n1, p2, n2) can be defined as:
G(p1, n1, p2, n2) = I(p1 + p2, n1 + n2) =
− p1+np11+n 1
+p2 +n2 I(p 1 , n 1 ) − p2 +n2
p1 +n1 +p2 +n2 I(p2 , n2 )
This graph shows I(p, n) assuming p + n = 100.
I(p, n=100-p)
1
0.8
0.6
0.4
0.2
0
0 20 40 60 80 100
p
0.5
50
40
30 50
20 30 40
p2 10 10 20
0 0
p1
Example of Attribute Selection
9 good, 5 bad 9 good, 5 bad
Outlook Temp
Sunny Rain Cool Hot
Overcast Mild
2 good 4 good 3 good 3 good 4 good 2 good
3 bad 0 bad 2 bad 1 bad 2 bad 2 bad
G(Outlook) ≈ 0.246 G(Temp) ≈ 0.029
b
• Chi-Squared Statistic
2 2
v (p j − p s j ) (n j − n s j )
χ2 = Σ +
j=1 p sj n sj
where sj = (pj + nj )/(p + n)
Overfitting
A hypothesis h overfits the training data if there
is a hypothesis h0 that is worse on the training
data, but better over the whole distribution.
Reasons for overfitting:
• Noise
• Coincidence
• Lack of Data
• Boundary Approximation
1.5
Aluminum
0.5
0
1.51 1.515 1.52 1.525 1.53 1.535
Refractive Index
In this region, note the boxes with only one example.
A Decision Tree for Glass2
1.8
1.7
1.6
1.5
Aluminum
1.4
1.3
1.2
1.1
1
1.5155 1.516 1.5165 1.517 1.5175 1.518 1.5185 1.519 1.5195
Refractive Index
1.5
Aluminum
0.5
0
1.51 1.515 1.52 1.525 1.53 1.535
Refractive Index
2
Petal Width
1.5
0.5
0
1 2 3 4 5 6 7
Petal Length
Pruned Example (c4.5 -m 1)
Pruned Decision Tree for Iris2
2.5
2
Petal Width
1.5
0.5
0
1 2 3 4 5 6 7
Petal Length
Postpruning Algorithm
Prune-DT(N : node, examples)
1. leaferr ← number of examples 6= N.class
2. revise leaferr upward if examples were training set
3. if N is a leaf then return leaferr
4. treeerr ← 0
5. for each value vj of N.test
6. examplesj ← examples with N.test = vj
7. suberr ← Prune-DT(N.branchj , examplesj )
8. treeerr ← treeerr + suberr
9. if leaferr ≤ treeerr
10. then make N a leaf and return leaferr
11. else return treeerr
Comments on Pruning
The training and validation set approach is:
Remove “validation” exs. from training exs.
Grow decision tree using training exs.
Prune decision tree using validation set.
Subtree raising is replacing a tree with one of
its subtrees.
Rule post-pruning as described in the book is
performed by the C4.5rules program, not C4.5.