Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Rob Schapire
Princeton University
Machine Learning
predicted
classification
Examples of Classification Problems
advantages:
often much more accurate than human-crafted rules
(since data driven)
humans often incapable of expressing what they know
(e.g., rules of English, or how to recognize letters),
but can easily classify examples
dont need a human expert or programmer
automatic method to search for hypotheses explaining
data
cheap and flexible can apply to any learning task
disadvantages
need a lot of labeled data
error prone usually impossible to get perfect accuracy
This Talk
tie
no yes
cape smokes
no yes no yes
batman
robin
alfred
penguin
catwoman
joker
tie
no yes
batman alfred
robin penguin
catwoman
joker
How to Build Decision Trees
choose rule to split on
divide data using splitting rule into disjoint subsets
repeat recursively for each subset
stop when leaves are (almost) pure
batman
robin
alfred
penguin
catwoman tie
joker
no yes
tie
no yes
batman alfred
robin penguin
catwoman
joker
How to Choose the Splitting Rule
tie cape
no yes no yes
tie cape
no yes no yes
impurity
0 1
1/2
p
mask
no yes
smokes cape
no yes no yes
mask
no yes
bad good
overly simple
doesnt even fit available data
Tree Size versus Accuracy
0 10 20 30 40 50 60 70 80 90 100
0.5
50
0.55
On test data
40
0.65
error (%)
test
ycaruccA
0.7
30
0.75
0.8
20
0.85
train
0.9
10
0
50
100
tree size
Training data:
Good and Bad Classifiers
sufficient data
simple classifier
Bad:
can prove:
r !
d
(generalization error) (training error) + O
m
pruning strategies:
grow on just part of training data, then find pruning with
minimum error on held out part
find pruning that minimizes
best known:
C4.5 (Quinlan)
CART (Breiman, Friedman, Olshen & Stone)
very fast to train and evaluate
relatively easy to interpret
but: accuracy often not state-of-the-art
Boosting
Example: Spam Filtering
for t = 1, . . . , T :
train weak classifier (rule of thumb) ht on Dt
AdaBoost
D1
h1 D2
1 =0.30
1=0.42
Round 2
h2 D3
! ! ! !
! ! ! !
! ! ! !
! ! ! !
! ! ! !
! ! ! !
! ! ! !
! ! ! !
! ! ! !
! ! ! !
! ! ! !
! ! ! !
! ! ! !
! ! ! !
! ! ! !
! ! ! !
! ! ! !
! ! ! !
! ! ! !
! ! ! !
! ! ! !
! ! ! !
! ! ! !
! ! ! !
! ! ! !
! ! ! !
! ! ! !
! ! ! !
! ! ! !
! ! ! !
2 =0.21
2=0.65
Round 3
h3
& & & & ( ( ( ( ( ( ( ( ( ( ( ( $ $ $ $ $ $ $ $ $ $ $ $ $ " " " " , , , , , , , , , , , , , , , ,
' ' ' ' ) ) ) ) ) ) ) ) ) ) ) ) % % % % % % % % % % % % % # # # # - - - - - - - - - - - - - - - -
* * * * * * * * * * * * * * * *
+ + + + + + + + + + + + + + + +
* * * * * * * * * * * * * * * *
+ + + + + + + + + + + + + + + +
* * * * * * * * * * * * * * * *
+ + + + + + + + + + + + + + + +
* * * * * * * * * * * * * * * *
+ + + + + + + + + + + + + + + +
* * * * * * * * * * * * * * * *
+ + + + + + + + + + + + + + + +
* * * * * * * * * * * * * * * *
+ + + + + + + + + + + + + + + +
* * * * * * * * * * * * * * * *
+ + + + + + + + + + + + + + + +
* * * * * * * * * * * * * * * *
+ + + + + + + + + + + + + + + +
* * * * * * * * * * * * * * * *
+ + + + + + + + + + + + + + + +
* * * * * * * * * * * * * * * *
+ + + + + + + + + + + + + + + +
* * * * * * * * * * * * * * * *
+ + + + + + + + + + + + + + + +
* * * * * * * * * * * * * * * *
+ + + + + + + + + + + + + + + +
* * * * * * * * * * * * * * * *
+ + + + + + + + + + + + + + + +
* * * * * * * * * * * * * * * *
+ + + + + + + + + + + + + + + +
* * * * * * * * * * * * * * * *
+ + + + + + + + + + + + + + + +
* * * * * * * * * * * * * * * *
+ + + + + + + + + + + + + + + +
* * * * * * * * * * * * * * * *
+ + + + + + + + + + + + + + + +
* * * * * * * * * * * * * * * *
+ + + + + + + + + + + + + + + +
* * * * * * * * * * * * * * * *
+ + + + + + + + + + + + + + + +
* * * * * * * * * * * * * * * *
+ + + + + + + + + + + + + + + +
* * * * * * * * * * * * * * * *
+ + + + + + + + + + + + + + + +
* * * * * * * * * * * * * * * *
+ + + + + + + + + + + + + + + +
3 =0.14
3=0.92
Final Classifier
2 2 2 2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3 3 3
6 6 . . . . . . . . : : : : : : : : : < <
7 7 / / / / / / / / ; ; ; ; ; ; ; ; ; = =
2 2 2 2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3 3 3
6 6 . . . . . . . . : : : : : : : : : < <
7 7 / / / / / / / / ; ; ; ; ; ; ; ; ; = =
2 2 2 2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3 3 3
6 6 . . . . . . . . : : : : : : : : : < <
7 7 / / / / / / / / ; ; ; ; ; ; ; ; ; = =
2 2 2 2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3 3 3
6 6 . . . . . . . . : : : : : : : : : < <
7 7 / / / / / / / / ; ; ; ; ; ; ; ; ; = =
2 2 2 2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3 3 3
6 6 . . . . . . . . : : : : : : : : : < <
7 7 / / / / / / / / ; ; ; ; ; ; ; ; ; = =
2 2 2 2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3 3 3
6 6 . . . . . . . . : : : : : : : : : < <
7 7 / / / / / / / / ; ; ; ; ; ; ; ; ; = =
2 2 2 2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3 3 3
4 4 4 4 4 4 4 4 4 4 4
6 6 . . . . . . . . : : : : : : : : : < < 5 5 5 5 5 5 5 5 5 5 5
7 7 / / / / / / / / ; ; ; ; ; ; ; ; ; = =
4 4 4 4 4 4 4 4 4 4 4
6 6 . . . . . . . . : : : : : : : : : < < 5 5 5 5 5 5 5 5 5 5 5
7 7 / / / / / / / / ; ; ; ; ; ; ; ; ; = =
4 4 4 4 4 4 4 4 4 4 4
6 6 . . . . . . . . : : : : : : : : : < < 5 5 5 5 5 5 5 5 5 5 5
7 7 / / / / / / / / ; ; ; ; ; ; ; ; ; = =
4 4 4 4 4 4 4 4 4 4 4
6 6 . . . . . . . . : : : : : : : : : < < 5 5 5 5 5 5 5 5 5 5 5
7 7 / / / / / / / / ; ; ; ; ; ; ; ; ; = =
4 4 4 4 4 4 4 4 4 4 4
6 6 . . . . . . . . : : : : : : : : : < < 5 5 5 5 5 5 5 5 5 5 5
7 7 / / / / / / / / ; ; ; ; ; ; ; ; ; = =
4 4 4 4 4 4 4 4 4 4 4
6 6 . . . . . . . . : : : : : : : : : < < 5 5 5 5 5 5 5 5 5 5 5
final
7 7 / / / / / / / / ; ; ; ; ; ; ; ; ; = =
4 4 4 4 4 4 4 4 4 4 4
6 6 . . . . . . . . : : : : : : : : : < < 5 5 5 5 5 5 5 5 5 5 5
7 7 / / / / / / / / ; ; ; ; ; ; ; ; ; = =
4 4 4 4 4 4 4 4 4 4 4
6 6 . . . . . . . . : : : : : : : : : < < 5 5 5 5 5 5 5 5 5 5 5
7 7 / / / / / / / / ; ; ; ; ; ; ; ; ; = =
4 4 4 4 4 4 4 4 4 4 4
6 6 . . . . . . . . : : : : : : : : : < < 5 5 5 5 5 5 5 5 5 5 5
7 7 / / / / / / / / ; ; ; ; ; ; ; ; ; = =
4 4 4 4 4 4 4 4 4 4 4
6 6 . . . . . . . . : : : : : : : : : < < 5 5 5 5 5 5 5 5 5 5 5
7 7 / / / / / / / / ; ; ; ; ; ; ; ; ; = =
4 4 4 4 4 4 4 4 4 4 4
6 6 . . . . . . . . : : : : : : : : : < < 5 5 5 5 5 5 5 5 5 5 5
7 7 / / / / / / / / ; ; ; ; ; ; ; ; ; = =
4 4 4 4 4 4 4 4 4 4 4
6 6 . . . . . . . . : : : : : : : : : < < 5 5 5 5 5 5 5 5 5 5 5
7 7 / / / / / / / / ; ; ; ; ; ; ; ; ; = =
>0 >0 >0 8 >0 8 > 8 > 8 > 8 > 8 > 8 > 8 > 8 > 8 8 8 8
?1 ?1 ?1 ?1 9 ? 9 ? 9 ? 9 ? 9 ? 9 ? 9 ? 9 ? 9 9 9 9
>0 >0 >0 8 >0 8 > 8 > 8 > 8 > 8 > 8 > 8 > 8 > 8 8 8 8
?1 ?1 ?1 ?1 9 ? 9 ? 9 ? 9 ? 9 ? 9 ? 9 ? 9 ? 9 9 9 9
>0 >0 >0 8 >0 8 > 8 > 8 > 8 > 8 > 8 > 8 > 8 > 8 8 8 8
?1 ?1 ?1 ?1 9 ? 9 ? 9 ? 9 ? 9 ? 9 ? 9 ? 9 ? 9 9 9 9
>0 >0 >0 8 >0 8 > 8 > 8 > 8 > 8 > 8 > 8 > 8 > 8 8 8 8
?1 ?1 ?1 ?1 9 ? 9 ? 9 ? 9 ? 9 ? 9 ? 9 ? 9 ? 9 9 9 9
>0 >0 >0 8 >0 8 > 8 > 8 > 8 > 8 > 8 > 8 > 8 > 8 8 8 8
?1 ?1 ?1 ?1 9 ? 9 ? 9 ? 9 ? 9 ? 9 ? 9 ? 9 ? 9 9 9 9
>0 >0 >0 8 >0 8 > 8 > 8 > 8 > 8 > 8 > 8 > 8 > 8 8 8 8
?1 ?1 ?1 ?1 9 ? 9 ? 9 ? 9 ? 9 ? 9 ? 9 ? 9 ? 9 9 9 9
>0 >0 >0 8 >0 8 > 8 > 8 > 8 > 8 > 8 > 8 > 8 > 8 8 8 8
?1 ?1 ?1 ?1 9 ? 9 ? 9 ? 9 ? 9 ? 9 ? 9 ? 9 ? 9 9 9 9
>0 >0 >0 8 >0 8 > 8 > 8 > 8 > 8 > 8 > 8 > 8 > 8 8 8 8
?1 ?1 ?1 ?1 9 ? 9 ? 9 ? 9 ? 9 ? 9 ? 9 ? 9 ? 9 9 9 9
>0 >0 >0 8 >0 8 > 8 > 8 > 8 > 8 > 8 > 8 > 8 > 8 8 8 8
?1 ?1 ?1 ?1 9 ? 9 ? 9 ? 9 ? 9 ? 9 ? 9 ? 9 ? 9 9 9 9
>0 >0 >0 8 >0 8 > 8 > 8 > 8 > 8 > 8 > 8 > 8 > 8 8 8 8
?1 ?1 ?1 ?1 9 ? 9 ? 9 ? 9 ? 9 ? 9 ? 9 ? 9 ? 9 9 9 9
0 0 0 8 0 8 8 8 8 8 8 8 8 8 8 8 8
1 1 1 1 9 9 9 9 9 9 9 9 9 9 9 9
0 0 0 8 0 8 8 8 8 8 8 8 8 8 8 8 8
1 1 1 1 9 9 9 9 9 9 9 9 9 9 9 9
= 0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
8
8
0
1
0
1
0
1
8
9
8
9
8
9
8
9
8
9
8
9
8
9
8
9
8
9
8
9
8
9
8
9
8
9
8
9
8
9
8
9
8
9
8
9
8
9
8
9
8
9
8
9
8
9
8
9
8
9
8
9
8
9
8
9
8
9
8
9
8
9
8
9
8
9
8
9
8
9
8
9
0 0 0 8 0 8 8 8 8 8 8 8 8 8 8 8 8
1 1 1 1 9 9 9 9 9 9 9 9 9 9 9 9
0 0 0 8 0 8 8 8 8 8 8 8 8 8 8 8 8
1 1 1 1 9 9 9 9 9 9 9 9 9 9 9 9
0 0 0 8 0 8 8 8 8 8 8 8 8 8 8 8 8
1 1 1 1 9 9 9 9 9 9 9 9 9 9 9 9
0 0 0 8 0 8 8 8 8 8 8 8 8 8 8 8 8
1 1 1 1 9 9 9 9 9 9 9 9 9 9 9 9
0 0 0 8 0 8 8 8 8 8 8 8 8 8 8 8 8
1 1 1 1 9 9 9 9 9 9 9 9 9 9 9 9
0 0 0 8 0 8 8 8 8 8 8 8 8 8 8 8 8
1 1 1 1 9 9 9 9 9 9 9 9 9 9 9 9
0 0 0 8 0 8 8 8 8 8 8 8 8 8 8 8 8
1 1 1 1 9 9 9 9 9 9 9 9 9 9 9 9
0 0 0 8 0 8 8 8 8 8 8 8 8 8 8 8 8
1 1 1 1 9 9 9 9 9 9 9 9 9 9 9 9
0 0 0 8 0 8 8 8 8 8 8 8 8 8 8 8 8
1 1 1 1 9 9 9 9 9 9 9 9 9 9 9 9
0 0 0 8 0 8 8 8 8 8 8 8 8 8 8 8 8
1 1 1 1 9 9 9 9 9 9 9 9 9 9 9 9
0 0 0 8 0 8 8 8 8 8 8 8 8 8 8 8 8
1 1 1 1 9 9 9 9 9 9 9 9 9 9 9 9
0 0 0 8 0 8 8 8 8 8 8 8 8 8 8 8 8
1 1 1 1 9 9 9 9 9 9 9 9 9 9 9 9
0 0 0 8 0 8 8 8 8 8 8 8 8 8 8 8 8
1 1 1 1 9 9 9 9 9 9 9 9 9 9 9 9
0 0 0 8 0 8 8 8 8 8 8 8 8 8 8 8 8
1 1 1 1 9 9 9 9 9 9 9 9 9 9 9 9
0 0 0 8 0 8 8 8 8 8 8 8 8 8 8 8 8
1 1 1 1 9 9 9 9 9 9 9 9 9 9 9 9
0 0 0 8 0 8 8 8 8 8 8 8 8 8 8 8 8
1 1 1 1 9 9 9 9 9 9 9 9 9 9 9 9
Theory: Training Error
0.8
0.6
error
0.4 test
0.2
train
20 40 60 80 100
# of rounds (T)
expect:
training error to continue to drop (or reach zero)
test error to increase when Hfinal becomes too complex
Occams razor
overfitting
hard to know when to stop training
Actual Typical Run
20
error
10 (boosting C4.5 on
test letter dataset)
5
0
train
10 100 1000
# of rounds (T)
key idea:
training error only measures whether classifications are
right or wrong
should also consider confidence of classifications
The Margins Explanation
key idea:
training error only measures whether classifications are
right or wrong
should also consider confidence of classifications
recall: Hfinal is weighted majority vote of weak classifiers
The Margins Explanation
key idea:
training error only measures whether classifications are
right or wrong
should also consider confidence of classifications
recall: Hfinal is weighted majority vote of weak classifiers
measure confidence by margin = strength of the vote
empirical evidence and mathematical proof that:
large margins better generalization error
(regardless of number of rounds)
boosting tends to increase margins of training examples
(given weak learning assumption)
Application: Human-computer Spoken Dialogue
[with Rahim, Di Fabbrizio, Dutton, Gupta, Hollister & Riccardi]
texttospeech automatic
speech
recognizer
text response
text
dialogue
manager natural language
predicted understanding
category
VC-dim = (# parameters) = n + 1
Finding the Maximum Margin Hyperplane
maximize:
subject to: yi (v xi ) and k v k= 1
Finding the Maximum Margin Hyperplane
maximize:
subject to: yi (v xi ) and k v k= 1
set w = v/ k w k= 1/
Finding the Maximum Margin Hyperplane
maximize:
subject to: yi (v xi ) and k v k= 1
set w = v/ k w k= 1/
1
minimize: 2k w k2
subject to: yi (w xi ) 1
Convex Dual
then
then
7 7 4 8 0 1 4
more is more
want training data to be like test data
use your knowledge of problem to know where to get training
data, and what to expect test data to be like
Choosing Features
R and matlab are great for easy coding, but for speed, may
need C or java
debugging machine learning algorithms is very tricky!
hard to tell if working, since dont know what to expect
run on small cases where can figure out answer by hand
test each module/subroutine separately
compare to other implementations
(written by others, or written in different language)
compare to theory or published results
Summary