Adaboost Matas

AdaBoost
Jiri Matas and Jan

Sochman
Centre for Machine Perception
Czech Technical University, Prague
http://cmp.felk.cvut.cz
Presentation
Outline:
AdaBoost algorithm
Why is of interest?
How it works?
Why it works?
AdaBoost variants
AdaBoost with a Totally Corrective Step (TCS)
Experiments with a Totally Corrective Step

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Introduction
1990 Boost-by-majority algorithm (Freund)
1995 AdaBoost (Freund & Schapire)
1997 Generalized version of AdaBoost (Schapire & Singer)
2001 AdaBoost in Face Detection (Viola & Jones)

Interesting properties:
AB is a linear classier with all its desirable properties.
AB output converges to the logarithm of likelihood ratio.
AB has good generalization properties.
AB is a feature selector with a principled strategy (minimisation of upper

bound on empirical error).
AB close to sequential decision making (it produces a sequence of gradually

more complex classiers).
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
What is AdaBoost?
AdaBoost is an algorithm for constructing a strong classier as linear

combination
f(x) =
T
t=1
t
h
t
(x)
of simple weak classiers h
t
(x).
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
What is AdaBoost?

combination
f(x) =
T
t=1
t
h
t
(x)
t
(x).
Terminology
h
t
(x) ... weak or basis classier, hypothesis, feature
H(x) = sign(f(x)) ... strong or nal classier/hypothesis

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
What is AdaBoost?

combination
f(x) =
T
t=1
t
h
t
(x)
t
(x).
Terminology
h
t
(x) ... weak or basis classier, hypothesis, feature
H(x) = sign(f(x)) ... strong or nal classier/hypothesis

Comments
The h
t
(x)s can be thought of as features.
Often (typically) the set H = {h(x)} is innite.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
(Discrete) AdaBoost Algorithm Singer & Schapire (1997)
Given: (x
1
, y
1
), ..., (x
m
, y
m
); x
i
X, y
i
{1, 1}
Initialize weights D
1
(i) = 1/m
For t = 1, ..., T:
1. (Call WeakLearn), which returns the weak classier h
t
: X {1, 1} with
minimum error w.r.t. distribution D
t
;
2. Choose
t
R,
3. Update
D
t+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
where Z
t
is a normalization factor chosen so that D
t+1
is a distribution
Output the strong classier:
H(x) = sign
t=1
t
h
t
(x)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
(Discrete) AdaBoost Algorithm Singer & Schapire (1997)
Given: (x
1
, y
1
), ..., (x
m
, y
m
); x
i
X, y
i
{1, 1}
1
(i) = 1/m
For t = 1, ..., T:
t
: X {1, 1} with
t
;
2. Choose
t
R,
3. Update
D
t+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
where Z
t
is a normalization factor chosen so that D
t+1
is a distribution
Output the strong classier:
H(x) = sign
t=1
t
h
t
(x)
Comments
The computational complexity of selecting h

t
is independent of t.
All information about previously selected features is captured in D

t
!
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
WeakLearn
Loop step: Call WeakLearn, given distribution D
t
;
returns weak classier h
t
: X {1, 1} from H = {h(x)}
Select a weak classier with the smallest weighted error

h
t
= arg min
h
j
H
j
=
m
i=1
D
t
(i)[y
i
= h
j
(x
i
)]
Prerequisite:
t
< 1/2 (otherwise stop)
WeakLearn examples:
Decision tree builder, perceptron learning rule H innite
Selecting the best one from given nite set H
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
WeakLearn
t
;
t
: X {1, 1} from H = {h(x)}

h
t
= arg min
h
j
H
j
=
m
i=1
D
t
(i)[y
i
= h
j
(x
i
)]
Prerequisite:
t
WeakLearn examples:
Demonstration example
Weak classier = perceptron
N(0, 1)
1
r
8
3
e
1/2(r4)
2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
WeakLearn
t
;
t
: X {1, 1} from H = {h(x)}

h
t
= arg min
h
j
H
j
=
m
i=1
D
t
(i)[y
i
= h
j
(x
i
)]
Prerequisite:
t
WeakLearn examples:
Training set
N(0, 1)
1
r
8
3
e
1/2(r4)
2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
WeakLearn
t
;
t
: X {1, 1} from H = {h(x)}

h
t
= arg min
h
j
H
j
=
m
i=1
D
t
(i)[y
i
= h
j
(x
i
)]
Prerequisite:
t
WeakLearn examples:
Training set
N(0, 1)
1
r
8
3
e
1/2(r4)
2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
AdaBoost as a Minimiser of an Upper Bound on the Empirical Error
The main objective is to minimize

tr
=
1
m
|{i : H(x
i
) = y
i
}|
It can be upper bounded by

tr
(H)
T
t=1
Z
t
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

tr
=
1
m
|{i : H(x
i
) = y
i
}|

tr
(H)
T
t=1
Z
t
How to set
t
?
Select
t
to greedily minimize Z
t
() in each step
Z
t
() is convex differentiable function with one extremum
h
t
(x) {1, 1} then optimal
t
=
1
2
log(
1+r
t
1r
t
)
where r
t
=
m
i=1
D
t
(i)h
t
(x
i
)y
i
Z
t
= 2
t
(1
t
) 1 for optimal
t
Justication of selection of h
t
according to
t
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

tr
=
1
m
|{i : H(x
i
) = y
i
}|

tr
(H)
T
t=1
Z
t
How to set
t
?
Select
t
to greedily minimize Z
t
() in each step
Z
t
() is convex differentiable function with one extremum
h
t
(x) {1, 1} then optimal
t
=
1
2
log(
1+r
t
1r
t
)
where r
t
=
m
i=1
D
t
(i)h
t
(x
i
)y
i
Z
t
= 2
t
(1
t
) 1 for optimal
t
Justication of selection of h
t
according to
t
Comments
The process of selecting

t
and h
t
(x) can be interpreted as a single
optimization step minimising the upper bound on the empirical error.
Improvement of the bound is guaranteed, provided that
t
< 1/2.
The process can be interpreted as a component-wise local optimization

(Gauss-Southwell iteration) in the (possibly innite dimensional!) space of
= (
1
,
2
, . . . ) starting from.
0
= (0, 0, . . . ).
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Reweighting
Effect on the training set
Reweighting formula:
D
t+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
=
exp(y
i
t
q=1
q
h
q
(x
i
))
m
t
q=1
Z
q
exp(
t
y
i
h
t
(x
i
))
< 1, y
i
= h
t
(x
i
)
> 1, y
i
= h
t
(x
i
)
}
Increase (decrease) weight of wrongly (correctly)
classied examples. The weight is the upper bound on the
error of a given example!
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Reweighting
D
t+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
=
exp(y
i
t
q=1
q
h
q
(x
i
))
m
t
q=1
Z
q
exp(
t
y
i
h
t
(x
i
))
< 1, y
i
= h
t
(x
i
)
> 1, y
i
= h
t
(x
i
)
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Reweighting
D
t+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
=
exp(y
i
t
q=1
q
h
q
(x
i
))
m
t
q=1
Z
q
exp(
t
y
i
h
t
(x
i
))
< 1, y
i
= h
t
(x
i
)
> 1, y
i
= h
t
(x
i
)
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Reweighting
D
t+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
=
exp(y
i
t
q=1
q
h
q
(x
i
))
m
t
q=1
Z
q
exp(
t
y
i
h
t
(x
i
))
< 1, y
i
= h
t
(x
i
)
> 1, y
i
= h
t
(x
i
)
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Reweighting
D
t+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
=
exp(y
i
t
q=1
q
h
q
(x
i
))
m
t
q=1
Z
q
exp(
t
y
i
h
t
(x
i
))
< 1, y
i
= h
t
(x
i
)
> 1, y
i
= h
t
(x
i
)
}
2 1.5 1 0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
yf(x)
e
rr
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Reweighting
D
t+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
=
exp(y
i
t
q=1
q
h
q
(x
i
))
m
t
q=1
Z
q
exp(
t
y
i
h
t
(x
i
))
< 1, y
i
= h
t
(x
i
)
> 1, y
i
= h
t
(x
i
)
}
2 1.5 1 0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
yf(x)
e
rr
Effect on h
t
t
minimize Z
t

i:h
t
(x
i
)=y
i
D
t+1
(i) =

i:h
t
(x
i
)=y
i
D
t+1
(i)
Error of h
t
on D
t+1
is 1/2
Next weak classier is the most

independent one
1 2 3 4
e
0.5
t
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Summary of the Algorithm
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Initialization...
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Initialization...
For t = 1, ..., T:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Initialization...
For t = 1, ..., T:
Find h
t
= arg min
h
j
H
j
=
m
i=1
D
t
(i)[y
i
= h
j
(x
i
)]
t = 1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Initialization...
For t = 1, ..., T:
Find h
t
= arg min
h
j
H
j
=
m
i=1
D
t
(i)[y
i
= h
j
(x
i
)]
If
t
1/2 then stop
t = 1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Initialization...
For t = 1, ..., T:
Find h
t
= arg min
h
j
H
j
=
m
i=1
D
t
(i)[y
i
= h
j
(x
i
)]
If
t
1/2 then stop
Set
t
=
1
2
log(
1+r
t
1r
t
)
t = 1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Initialization...
For t = 1, ..., T:
Find h
t
= arg min
h
j
H
j
=
m
i=1
D
t
(i)[y
i
= h
j
(x
i
)]
If
t
1/2 then stop
Set
t
=
1
2
log(
1+r
t
1r
t
)
Update
D
t+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
t = 1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Initialization...
For t = 1, ..., T:
Find h
t
= arg min
h
j
H
j
=
m
i=1
D
t
(i)[y
i
= h
j
(x
i
)]
If
t
1/2 then stop
Set
t
=
1
2
log(
1+r
t
1r
t
)
Update
D
t+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
Output the nal classier:
H(x) = sign
t=1
t
h
t
(x)
t = 1
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Initialization...
For t = 1, ..., T:
Find h
t
= arg min
h
j
H
j
=
m
i=1
D
t
(i)[y
i
= h
j
(x
i
)]
If
t
1/2 then stop
Set
t
=
1
2
log(
1+r
t
1r
t
)
Update
D
t+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
H(x) = sign
t=1
t
h
t
(x)
t = 2
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Initialization...
For t = 1, ..., T:
Find h
t
= arg min
h
j
H
j
=
m
i=1
D
t
(i)[y
i
= h
j
(x
i
)]
If
t
1/2 then stop
Set
t
=
1
2
log(
1+r
t
1r
t
)
Update
D
t+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
H(x) = sign
t=1
t
h
t
(x)
t = 3
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Initialization...
For t = 1, ..., T:
Find h
t
= arg min
h
j
H
j
=
m
i=1
D
t
(i)[y
i
= h
j
(x
i
)]
If
t
1/2 then stop
Set
t
=
1
2
log(
1+r
t
1r
t
)
Update
D
t+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
H(x) = sign
t=1
t
h
t
(x)
t = 4
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Initialization...
For t = 1, ..., T:
Find h
t
= arg min
h
j
H
j
=
m
i=1
D
t
(i)[y
i
= h
j
(x
i
)]
If
t
1/2 then stop
Set
t
=
1
2
log(
1+r
t
1r
t
)
Update
D
t+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
H(x) = sign
t=1
t
h
t
(x)
t = 5
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Initialization...
For t = 1, ..., T:
Find h
t
= arg min
h
j
H
j
=
m
i=1
D
t
(i)[y
i
= h
j
(x
i
)]
If
t
1/2 then stop
Set
t
=
1
2
log(
1+r
t
1r
t
)
Update
D
t+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
H(x) = sign
t=1
t
h
t
(x)
t = 6
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Initialization...
For t = 1, ..., T:
Find h
t
= arg min
h
j
H
j
=
m
i=1
D
t
(i)[y
i
= h
j
(x
i
)]
If
t
1/2 then stop
Set
t
=
1
2
log(
1+r
t
1r
t
)
Update
D
t+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
H(x) = sign
t=1
t
h
t
(x)
t = 7
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Initialization...
For t = 1, ..., T:
Find h
t
= arg min
h
j
H
j
=
m
i=1
D
t
(i)[y
i
= h
j
(x
i
)]
If
t
1/2 then stop
Set
t
=
1
2
log(
1+r
t
1r
t
)
Update
D
t+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
H(x) = sign
t=1
t
h
t
(x)
t = 40
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Does AdaBoost generalize?
Margins in SVM
max min
(x,y)S
y(
h(x))

2
Margins in AdaBoost
max min
(x,y)S
y(
h(x))

1
Maximizing margins in AdaBoost
P
S
[yf(x) ] 2
T
T
t=1
1
t
(1
t
)
1+
where f(x) =

h(x)

1
Upper bounds based on margin
P
D
[yf(x) 0] P
S
[yf(x) ] +O
d log
2
(m/d)
2
+ log(1/)
1/2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
AdaBoost variants
Freund & Schapire 1995
Discrete (h : X {0, 1})
Multiclass AdaBoost.M1 (h : X {0, 1, ..., k})
Multiclass AdaBoost.M2 (h : X [0, 1]

k
)
Real valued AdaBoost.R (Y = [0, 1], h : X [0, 1])

Schapire & Singer 1997
Condence rated prediction (h : X R, two-class)
Multilabel AdaBoost.MR, AdaBoost.MH (different formulation of minimized

loss)
... Many other modications since then (Totally Corrective AB, Cascaded AB)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Pros and cons of AdaBoost
Advantages
Very simple to implement
Feature selection on very large sets of features
Fairly good generalization

Disadvantages
Suboptimal solution for
Can overt in presence of noise

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Adaboost with a Totally Corrective Step (TCA)
Given: (x
1
, y
1
), ..., (x
m
, y
m
); x
i
X, y
i
{1, 1}
1
(i) = 1/m
For t = 1, ..., T:
t
: X {1, 1} with
t
;
2. Choose
t
R,
3. Update D
t+1
4. (Call WeakLearn) on the set of h
m
s with non zero s . Update
.
.
Update D
t+1
. Repeat till |
t
1/2| < , t.
Comments
All weak classiers have

t
1/2, therefore the classier selected at t + 1 is
independent of all classiers selected so far.
It can be easily shown, that the totally corrective step reduces the upper
bound on the empirical error without increasing classier complexity.
The TCA was rst proposed by Kivinen and Warmuth, but their
t
is set as in
stadard Adaboost.
Generalization of TCA is an open question.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Experiments with TCA on the IDA Database
Discrete AdaBoost, Real AdaBoost, and Discrete and Real TCA evaluated
Weak learner: stumps.
Data from the IDA repository (Ratsch:2000):

Input Training Testing Number of
dimension patterns patterns realizations
Banana 2 400 4900 100
Breast cancer 9 200 77 100
Diabetes 8 468 300 100
German 20 700 300 100
Heart 13 170 100 100
Image segment 18 1300 1010 20
Ringnorm 20 400 7000 100
Flare solar 9 666 400 100
Splice 60 1000 2175 20
Thyroid 5 140 75 100
Titanic 3 150 2051 100
Twonorm 20 400 7000 100
Waveform 21 400 4600 100
Note that the training sets are fairly small

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Results with TCA on the IDA Database
Training error (dashed line), test error (solid line)
Discrete AdaBoost (blue), Real AdaBoost (green),
Discrete AdaBoost with TCA (red), Real AdaBoost with TCA (cyan)
the black horizontal line: the error of AdaBoost with RBF network weak classiers from (Ratsch-ML:2000)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
10
0
10
1
10
2
10
3
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
Length of the strong classifier
IMAGE
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
10
0
10
1
10
2
10
3
0.32
0.34
0.36
0.38
0.4
0.42
0.44
0.46
FLARE
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
10
0
10
1
10
2
10
3
0.16
0.18
0.2
0.22
0.24
0.26
0.28
0.3
0.32
GERMAN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
10
0
10
1
10
2
10
3
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
RINGNORM
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
10
0
10
1
10
2
10
3
0
0.05
0.1
0.15
0.2
0.25
SPLICE
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
10
0
10
1
10
2
10
3
0
0.05
0.1
0.15
0.2
0.25
THYROID
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
10
0
10
1
10
2
10
3
0.21
0.215
0.22
0.225
0.23
0.235
0.24
TITANIC
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
10
0
10
1
10
2
10
3
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
BANANA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
10
0
10
1
10
2
10
3
0.24
0.25
0.26
0.27
0.28
0.29
0.3
0.31
BREAST
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
10
0
10
1
10
2
10
3
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
DIABETIS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
10
0
10
1
10
2
10
3
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
HEART
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Conclusions
The AdaBoost algorithm was presented and analysed
A modication of the Totally Corrective AdaBoost was introduced
Initial test show that the TCA outperforms AB on some standard data sets.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
2 1.5 1 0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
yf(x)
e
r
r
1 2 3 4
e
0.5
t
2 1.5 1 0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
yf(x)
e
r
r
2 1.5 1 0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
yf(x)
e
r
r
1 2 3 4
e
0.5
t
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
10
0
10
1
10
2
10
3
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
IMAGE
10
0
10
1
10
2
10
3
0.32
0.34
0.36
0.38
0.4
0.42
0.44
0.46
FLARE
10
0
10
1
10
2
10
3
0.16
0.18
0.2
0.22
0.24
0.26
0.28
0.3
0.32
GERMAN
10
0
10
1
10
2
10
3
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
RINGNORM
10
0
10
1
10
2
10
3
0
0.05
0.1
0.15
0.2
0.25
SPLICE
10
0
10
1
10
2
10
3
0
0.05
0.1
0.15
0.2
0.25
THYROID
10
0
10
1
10
2
10
3
0.21
0.215
0.22
0.225
0.23
0.235
0.24
TITANIC
10
0
10
1
10
2
10
3
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
BANANA
10
0
10
1
10
2
10
3
0.24
0.25
0.26
0.27
0.28
0.29
0.3
0.31
BREAST
10
0
10
1
10
2
10
3
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
DIABETIS
10
0
10
1
10
2
10
3
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
HEART
10
0
10
1
10
2
10
3
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
IMAGE
10
0
10
1
10
2
10
3
0.32
0.34
0.36
0.38
0.4
0.42
0.44
0.46
FLARE
10
0
10
1
10
2
10
3
0.16
0.18
0.2
0.22
0.24
0.26
0.28
0.3
0.32
GERMAN
10
0
10
1
10
2
10
3
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
RINGNORM
10
0
10
1
10
2
10
3
0
0.05
0.1
0.15
0.2
0.25
SPLICE
10
0
10
1
10
2
10
3
0
0.05
0.1
0.15
0.2
0.25
THYROID
10
0
10
1
10
2
10
3
0.21
0.215
0.22
0.225
0.23
0.235
0.24
TITANIC
10
0
10
1
10
2
10
3
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
BANANA
10
0
10
1
10
2
10
3
0.24
0.25
0.26
0.27
0.28
0.29
0.3
0.31
BREAST
10
0
10
1
10
2
10
3
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
DIABETIS
10
0
10
1
10
2
10
3
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
HEART

Adaboost Matas

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Adaboost Matas

Caricato da

Copyright:

Formati disponibili

AdaBoost

Jiri Matas and Jan

AdaBoost with a Totally Corrective Step (TCS)

Experiments with a Totally Corrective Step

1990 Boost-by-majority algorithm (Freund)

1995 AdaBoost (Freund & Schapire)

1997 Generalized version of AdaBoost (Schapire & Singer)

2001 AdaBoost in Face Detection (Viola & Jones)

AB is a linear classier with all its desirable properties.

AB output converges to the logarithm of likelihood ratio.

AB has good generalization properties.

AB is a feature selector with a principled strategy (minimisation of upper

AB close to sequential decision making (it produces a sequence of gradually

AdaBoost is an algorithm for constructing a strong classier as linear

AdaBoost is an algorithm for constructing a strong classier as linear

H(x) = sign(f(x)) ... strong or nal classier/hypothesis

AdaBoost is an algorithm for constructing a strong classier as linear

H(x) = sign(f(x)) ... strong or nal classier/hypothesis

Often (typically) the set H = {h(x)} is innite.

The computational complexity of selecting h

All information about previously selected features is captured in D

Select a weak classier with the smallest weighted error

Select a weak classier with the smallest weighted error

Select a weak classier with the smallest weighted error

Select a weak classier with the smallest weighted error

The main objective is to minimize

It can be upper bounded by

The main objective is to minimize

It can be upper bounded by

The main objective is to minimize

It can be upper bounded by

The process of selecting

The process can be interpreted as a component-wise local optimization

Next weak classier is the most

Discrete (h : X {0, 1})

Multiclass AdaBoost.M1 (h : X {0, 1, ..., k})

Multiclass AdaBoost.M2 (h : X [0, 1]

Real valued AdaBoost.R (Y = [0, 1], h : X [0, 1])

Condence rated prediction (h : X R, two-class)

Multilabel AdaBoost.MR, AdaBoost.MH (different formulation of minimized

Very simple to implement

Feature selection on very large sets of features

Fairly good generalization

Suboptimal solution for

Can overt in presence of noise

All weak classiers have

Generalization of TCA is an open question.

Weak learner: stumps.

Data from the IDA repository (Ratsch:2000):

Note that the training sets are fairly small

The AdaBoost algorithm was presented and analysed

A modication of the Totally Corrective AdaBoost was introduced

Potrebbero piacerti anche