Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Rob Schapire
non-spam
spam
..
.
from non-spam
Machine Learning
studies how to automatically learn to make accurate
labeled
training
examples
machine learning
algorithm
classification
rule
predicted
classification
Back to Spam
main observation:
Key Details
Boosting
Early History
[Valiant 84]:
[Schapire 89]:
[Freund 90]:
(x1 , y1 ), . . . , (xm , ym )
AdaBoost
[with Freund]
constructing Dt :
D1 (i ) = 1/m
given Dt and ht :
Dt+1 (i ) =
=
Dt (i )
e t if yi = ht (xi )
if yi 6= ht (xi )
e t
Zt
Dt (i )
exp(t yi ht (xi ))
Zt
where Zt = normalization
factor
1 t
1
>0
t = 2 ln
t
final classifier:
!
X
Hfinal (x) = sign
t ht (x)
t
Toy Example
D1
Round 1
h1
1111
0000
0000000000000
1111111111111
1111
1111111111111
0000
0000000000000
0000
1111
0000000000000
1111111111111
1111
1111111111111
0000
0000000000000
0000
1111
0000000000000
1111111111111
1111
1111111111111
0000
0000000000000
0000
1111
0000000000000
1111111111111
0000
1111
0000000000000
1111111111111
0000
1111
0000000000000
1111111111111
1111
1111111111111
0000
0000000000000
1111
1111111111111
0000
0000000000000
0000
1111
0000000000000
1111111111111
1111
1111111111111
0000
0000000000000
1111
1111111111111
0000
0000000000000
0000
1111
0000000000000
1111111111111
0000
1111
0000000000000
1111111111111
0000
1111
0000000000000
1111111111111
1111
1111111111111
0000
0000000000000
1111
1111111111111
0000
0000000000000
0000
1111
0000000000000
1111111111111
1111
1111111111111
0000
0000000000000
1111
1111111111111
0000
0000000000000
0000
1111
0000000000000
1111111111111
0000
1111
0000000000000
1111111111111
0000
1111
0000000000000
1111111111111
1111
1111111111111
0000
0000000000000
1111
1111111111111
0000
0000000000000
0000
1111
0000000000000
1111111111111
0000
1111
0000000000000
1111111111111
0000
1111
0000000000000
1111111111111
1 =0.30
1=0.42
D2
Round 2
1111
0000
0000000000000
1111111111111
1111
1111111111111
0000
0000000000000
0000
1111
0000000000000
1111111111111
1111
1111111111111
0000
0000000000000
0000
1111
0000000000000
1111111111111
1111
1111111111111
0000
0000000000000
0000
1111
0000000000000
1111111111111
0000
1111
0000000000000
1111111111111
0000
1111
0000000000000
1111111111111
1111
1111111111111
0000
0000000000000
1111
1111111111111
0000
0000000000000
0000
1111
0000000000000
1111111111111
1111
1111111111111
0000
0000000000000
1111
1111111111111
0000
0000000000000
0000
1111
0000000000000
1111111111111
0000
1111
0000000000000
1111111111111
0000
1111
0000000000000
1111111111111
1111
1111111111111
0000
0000000000000
1111
1111111111111
0000
0000000000000
0000
1111
0000000000000
1111111111111
1111
1111111111111
0000
0000000000000
1111
1111111111111
0000
0000000000000
0000
1111
0000000000000
1111111111111
0000
1111
0000000000000
1111111111111
0000
1111
0000000000000
1111111111111
1111
1111111111111
0000
0000000000000
1111
1111111111111
0000
0000000000000
0000
1111
0000000000000
1111111111111
0000
1111
0000000000000
1111111111111
0000
1111
0000000000000
1111111111111
h2
11111111111111111
0000000000000
0000
1111111111111
1111
0000000000000
0000
0000000000000
1111111111111
0000
11111111111111111
1111
0000000000000
0000
0000000000000
1111111111111
0000
1111
1111111111111
1111
0000000000000
0000
0000000000000
1111111111111
0000
00000000000001111
1111111111111
0000
1111
0000000000000
1111111111111
0000
1111
1111111111111
1111
0000000000000
11111111111110000
1111
0000000000000
0000
0000000000000
1111111111111
0000
1111
1111111111111
1111
0000000000000
0000
1111111111111
1111
0000000000000
00000000000000000
1111111111111
0000
1111
0000000000000
1111111111111
0000
1111
0000000000000
1111111111111
0000
1111
1111111111111
1111
0000000000000
11111111111110000
1111
0000000000000
0000
0000000000000
1111111111111
0000
1111
1111111111111
1111
0000000000000
11111111111110000
1111
0000000000000
0000
0000000000000
1111111111111
0000
1111
0000000000000
1111111111111
0000
1111
0000000000000
1111111111111
0000
1111111111111
1111
00000000000001111
0000
1111111111111
1111
0000000000000
0000
0000000000000
1111111111111
0000
00000000000001111
1111111111111
0000
1111
0000000000000
1111111111111
0000
1111
2 =0.21
2=0.65
D3
Round 3
1111
0000
0000000000000
1111111111111
1111
1111111111111
0000
0000000000000
0000
1111
0000000000000
1111111111111
1111
1111111111111
0000
0000000000000
0000
1111
0000000000000
1111111111111
1111
1111111111111
0000
0000000000000
0000
1111
0000000000000
1111111111111
0000
1111
0000000000000
1111111111111
0000
1111
0000000000000
1111111111111
1111
1111111111111
0000
0000000000000
1111
1111111111111
0000
0000000000000
0000
1111
0000000000000
1111111111111
1111
1111111111111
0000
0000000000000
1111
1111111111111
0000
0000000000000
0000
1111
0000000000000
1111111111111
0000
1111
0000000000000
1111111111111
0000
1111
0000000000000
1111111111111
1111
1111111111111
0000
0000000000000
1111
1111111111111
0000
0000000000000
0000
1111
0000000000000
1111111111111
1111
1111111111111
0000
0000000000000
1111
1111111111111
0000
0000000000000
0000
1111
0000000000000
1111111111111
0000
1111
0000000000000
1111111111111
0000
1111
0000000000000
1111111111111
1111
1111111111111
0000
0000000000000
1111
1111111111111
0000
0000000000000
0000
1111
0000000000000
1111111111111
0000
1111
0000000000000
1111111111111
0000
1111
0000000000000
1111111111111
11111111111111111
0000000000000
0000
000000000000000
111111111111111
1111111111111
1111
0000000000000
0000
000000000000000
111111111111111
0000000000000
1111111111111
0000
000000000000000
111111111111111
11111111111111111
1111
000000000000000
0000000000000
0000
111111111111111
0000000000000
1111111111111
0000
1111
000000000000000
111111111111111
1111111111111
1111
000000000000000
111111111111111
0000000000000
0000
0000000000000
1111111111111
0000
1111
000000000000000
h
000000000000000
00000000000001111
1111111111111
0000 3 111111111111111
1111
111111111111111
000000000000000
111111111111111
0000000000000
1111111111111
0000
1111111111111
1111
000000000000000
111111111111111
0000000000000
0000
11111111111111111
1111
0000000000000
0000
000000000000000
111111111111111
0000000000000
1111111111111
0000
000000000000000
111111111111111
1111111111111
1111
0000000000000
0000
000000000000000
111111111111111
1111111111111
1111
0000000000000
0000
000000000000000
111111111111111
00000000000001111
1111111111111
0000
1111
000000000000000
111111111111111
0000000000000
1111111111111
0000
000000000000000
111111111111111
0000000000000
1111111111111
0000
1111
000000000000000
111111111111111
1111111111111
1111
0000000000000
0000
000000000000000
111111111111111
1111111111111
1111
00000000000001111
0000
000000000000000
111111111111111
0000000000000
1111111111111
0000
000000000000000
111111111111111
1111111111111
1111
0000000000000
0000
000000000000000
111111111111111
1111111111111
1111
00000000000001111
0000
000000000000000
111111111111111
0000000000000
1111111111111
0000
000000000000000
111111111111111
0000000000000
1111111111111
0000
1111
000000000000000
111111111111111
0000000000000
1111111111111
0000
1111
000000000000000
111111111111111
1111111111111
1111
00000000000001111
0000
000000000000000
111111111111111
1111111111111
0000000000000
0000
000000000000000
111111111111111
0000000000000
1111111111111
0000
1111
000000000000000
111111111111111
00000000000001111
1111111111111
0000
1111
000000000000000
111111111111111
0000000000000
1111111111111
0000
000000000000000
111111111111111
3 =0.14
3=0.92
Final Classifier
H
= sign
final
000
000000000
111
111111111
000
000000000
111
111111111
000
000000000
111
111111111
000
111
000000000
111111111
000
000000000
111
111111111
000
000000000
111
111111111
000
111
000000000
111111111
000
000000000
111
111111111
000
000000000
111
111111111
000
000000000
111
111111111
000
000000000 +
111
111111111
0.42111
000
000000000
111111111
000
000000000
111
111111111
000
000000000
111
111111111
000
000000000
111
111111111
000
111
000000000
111111111
000
000000000
111
111111111
000
000000000
111
111111111
000
000000000
111
111111111
000000000
00
111111111
11
000000000
00
111111111
11
000000000
00
111111111
11
000000000
111111111
00
11
000000000
00
111111111
11
000000000
00
111111111
11
000000000
111111111
00
11
000000000
00
111111111
11
000000000
00
111111111
11
000000000
00
111111111
11
000000000
00
111111111
11
0.65
000000000
111111111
00
11
000000000
00
111111111
11
000000000
00
111111111
11
000000000
00
111111111
11
000000000
111111111
00
11
000000000
00
111111111
11
000000000
00
111111111
11
000000000
00
111111111
11
0000000000000
1111111111111
00000
11111
000000000000
111111111111
1111111111111
0000000000000
00000
11111
000000000000
111111111111
0000000000000
1111111111111
00000
11111
000000000000
111111111111
0000000000000
1111111111111
00000
11111
000000000000
111111111111
0000000000000
1111111111111
00000
11111
000000000000
111111111111
0000000000000
1111111111111
00000
11111
000000000000
111111111111
0000000000000
1111111111111
00000
11111
000000000000
111111111111
0000000000000
1111111111111
00000
11111
000000000000
111111111111
0000000000000
1111111111111
00000
11111
000000000000
111111111111
0000000000000
1111111111111
00000
11111
000000000000
111111111111
0000000000000
1111111111111
11111
111111111111
00000
000000000000
11111
111111111111
00000
000000000000
00000
11111
000000000000
111111111111
11111
111111111111
00000
000000000000
11111
111111111111
00000
000000000000
00000
11111
000000000000
111111111111
00000
11111
000000000000
111111111111
00000
11111
000000000000
111111111111
11111
111111111111
00000
000000000000
11111
111111111111
00000
000000000000
00000
11111
000000000000
111111111111
11111
111111111111
00000
000000000000
11111
111111111111
00000
000000000000
00000
11111
000000000000
111111111111
00000
11111
000000000000
111111111111
00000
11111
000000000000
111111111111
11111
111111111111
00000
000000000000
11111
111111111111
00000
000000000000
00000
11111
000000000000
111111111111
00000
11111
000000000000
111111111111
00000
11111
000000000000
111111111111
11111
111111111111
00000
000000000000
+ 0.92
0000000000
1111111111
1111111111
0000000000
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
AdaBoost (recap)
given training set (x1 , y1 ), . . . , (xm , ym )
Dt+1 (i ) =
Dt (i )
exp (t yi ht (xi ))
Zt
write t as
then
1
2
[ t = edge ]
training error(Hfinal )
i
Yh p
2 t (1 t )
t
Yq
1 4t2
exp 2
X
t
so: if t : t > 0
t2
error
0.8
0.6
0.4
test
0.2
train
20
40
60
80
100
# of rounds (T)
expect:
training error to continue to drop (or reach zero)
test error to increase when Hfinal becomes too complex
Occams razor
overfitting
hard to know when to stop training
Technically...
bound depends on
m = # training examples
d = complexity of weak classifiers
T = # rounds
dT
m
test
error
20
15
(boosting stumps on
heart-disease dataset)
train
10
5
0
1
10
100
# rounds
but often doesnt...
1000
error
15
(boosting C4.5 on
letter dataset)
10
5
0
test
train
10
100
1000
# of rounds (T)
test error does not increase, even after 1000 rounds
train error
test error
# rounds
5 100 1000
0.0 0.0
0.0
8.4 3.3
3.1
key idea:
Hfinal
incorrect
high conf.
correct
low conf.
0
Hfinal
correct
+1
20
error
15
10
5
0
test
train
10
100
1000
# of rounds (T)
train error
test error
% margins 0.5
minimum margin
cumulative distribution
1000
100
0.5
5
-1
-0.5
margin
# rounds
5
100 1000
0.0
0.0
0.0
8.4
3.3
3.1
7.7
0.0
0.0
0.14 0.52 0.55
0.5
More Theory
many other ways of understanding AdaBoost:
continuous time
of thumb
shift in mind set goal now is merely to find classifiers
barely better than random guessing
versatile
Caveats
noise
UCI Experiments
[with Freund]
tested AdaBoost on UCI benchmarks
used:
no
predict
-1
no
predict
+1
30
30
25
25
20
20
C4.5
C4.5
UCI Results
15
15
10
10
0
0
10
15
20
25
30
boosting Stumps
10
15
20
25
boosting C4.5
30
How It Works
Human
computer
speech
texttospeech
raw
utterance
automatic
speech
recognizer
text response
text
dialogue
manager
predicted
category
natural language
understanding