Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
AdaBoost algorithm
Why is of interest?
How it works?
Why it works?
AdaBoost variants
t=1
t
h
t
(x)
of simple weak classiers h
t
(x).
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
What is AdaBoost?
t=1
t
h
t
(x)
of simple weak classiers h
t
(x).
Terminology
h
t
(x) ... weak or basis classier, hypothesis, feature
t=1
t
h
t
(x)
of simple weak classiers h
t
(x).
Terminology
h
t
(x) ... weak or basis classier, hypothesis, feature
The h
t
(x)s can be thought of as features.
t=1
t
h
t
(x)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
(Discrete) AdaBoost Algorithm Singer & Schapire (1997)
Given: (x
1
, y
1
), ..., (x
m
, y
m
); x
i
X, y
i
{1, 1}
Initialize weights D
1
(i) = 1/m
For t = 1, ..., T:
1. (Call WeakLearn), which returns the weak classier h
t
: X {1, 1} with
minimum error w.r.t. distribution D
t
;
2. Choose
t
R,
3. Update
D
t+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
where Z
t
is a normalization factor chosen so that D
t+1
is a distribution
Output the strong classier:
H(x) = sign
t=1
t
h
t
(x)
Comments
j
=
m
i=1
D
t
(i)[y
i
= h
j
(x
i
)]
Prerequisite:
t
< 1/2 (otherwise stop)
WeakLearn examples:
Decision tree builder, perceptron learning rule H innite
Selecting the best one from given nite set H
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
WeakLearn
Loop step: Call WeakLearn, given distribution D
t
;
returns weak classier h
t
: X {1, 1} from H = {h(x)}
j
=
m
i=1
D
t
(i)[y
i
= h
j
(x
i
)]
Prerequisite:
t
< 1/2 (otherwise stop)
WeakLearn examples:
Decision tree builder, perceptron learning rule H innite
Selecting the best one from given nite set H
Demonstration example
Weak classier = perceptron
N(0, 1)
1
r
8
3
e
1/2(r4)
2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
WeakLearn
Loop step: Call WeakLearn, given distribution D
t
;
returns weak classier h
t
: X {1, 1} from H = {h(x)}
j
=
m
i=1
D
t
(i)[y
i
= h
j
(x
i
)]
Prerequisite:
t
< 1/2 (otherwise stop)
WeakLearn examples:
Decision tree builder, perceptron learning rule H innite
Selecting the best one from given nite set H
Demonstration example
Training set
Weak classier = perceptron
N(0, 1)
1
r
8
3
e
1/2(r4)
2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
WeakLearn
Loop step: Call WeakLearn, given distribution D
t
;
returns weak classier h
t
: X {1, 1} from H = {h(x)}
j
=
m
i=1
D
t
(i)[y
i
= h
j
(x
i
)]
Prerequisite:
t
< 1/2 (otherwise stop)
WeakLearn examples:
Decision tree builder, perceptron learning rule H innite
Selecting the best one from given nite set H
Demonstration example
Training set
Weak classier = perceptron
N(0, 1)
1
r
8
3
e
1/2(r4)
2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
AdaBoost as a Minimiser of an Upper Bound on the Empirical Error
t=1
Z
t
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
AdaBoost as a Minimiser of an Upper Bound on the Empirical Error
t=1
Z
t
How to set
t
?
Select
t
to greedily minimize Z
t
() in each step
Z
t
() is convex differentiable function with one extremum
h
t
(x) {1, 1} then optimal
t
=
1
2
log(
1+r
t
1r
t
)
where r
t
=
m
i=1
D
t
(i)h
t
(x
i
)y
i
Z
t
= 2
t
(1
t
) 1 for optimal
t
Justication of selection of h
t
according to
t
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
AdaBoost as a Minimiser of an Upper Bound on the Empirical Error
t=1
Z
t
How to set
t
?
Select
t
to greedily minimize Z
t
() in each step
Z
t
() is convex differentiable function with one extremum
h
t
(x) {1, 1} then optimal
t
=
1
2
log(
1+r
t
1r
t
)
where r
t
=
m
i=1
D
t
(i)h
t
(x
i
)y
i
Z
t
= 2
t
(1
t
) 1 for optimal
t
Justication of selection of h
t
according to
t
Comments
t
q=1
q
h
q
(x
i
))
m
t
q=1
Z
q
exp(
t
y
i
h
t
(x
i
))
< 1, y
i
= h
t
(x
i
)
> 1, y
i
= h
t
(x
i
)
}
Increase (decrease) weight of wrongly (correctly)
classied examples. The weight is the upper bound on the
error of a given example!
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Reweighting
Effect on the training set
Reweighting formula:
D
t+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
=
exp(y
i
t
q=1
q
h
q
(x
i
))
m
t
q=1
Z
q
exp(
t
y
i
h
t
(x
i
))
< 1, y
i
= h
t
(x
i
)
> 1, y
i
= h
t
(x
i
)
}
Increase (decrease) weight of wrongly (correctly)
classied examples. The weight is the upper bound on the
error of a given example!
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Reweighting
Effect on the training set
Reweighting formula:
D
t+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
=
exp(y
i
t
q=1
q
h
q
(x
i
))
m
t
q=1
Z
q
exp(
t
y
i
h
t
(x
i
))
< 1, y
i
= h
t
(x
i
)
> 1, y
i
= h
t
(x
i
)
}
Increase (decrease) weight of wrongly (correctly)
classied examples. The weight is the upper bound on the
error of a given example!
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Reweighting
Effect on the training set
Reweighting formula:
D
t+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
=
exp(y
i
t
q=1
q
h
q
(x
i
))
m
t
q=1
Z
q
exp(
t
y
i
h
t
(x
i
))
< 1, y
i
= h
t
(x
i
)
> 1, y
i
= h
t
(x
i
)
}
Increase (decrease) weight of wrongly (correctly)
classied examples. The weight is the upper bound on the
error of a given example!
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Reweighting
Effect on the training set
Reweighting formula:
D
t+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
=
exp(y
i
t
q=1
q
h
q
(x
i
))
m
t
q=1
Z
q
exp(
t
y
i
h
t
(x
i
))
< 1, y
i
= h
t
(x
i
)
> 1, y
i
= h
t
(x
i
)
}
Increase (decrease) weight of wrongly (correctly)
classied examples. The weight is the upper bound on the
error of a given example!
2 1.5 1 0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
yf(x)
e
rr
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Reweighting
Effect on the training set
Reweighting formula:
D
t+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
=
exp(y
i
t
q=1
q
h
q
(x
i
))
m
t
q=1
Z
q
exp(
t
y
i
h
t
(x
i
))
< 1, y
i
= h
t
(x
i
)
> 1, y
i
= h
t
(x
i
)
}
Increase (decrease) weight of wrongly (correctly)
classied examples. The weight is the upper bound on the
error of a given example!
2 1.5 1 0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
yf(x)
e
rr
Effect on h
t
t
minimize Z
t
i:h
t
(x
i
)=y
i
D
t+1
(i) =
i:h
t
(x
i
)=y
i
D
t+1
(i)
Error of h
t
on D
t+1
is 1/2
Find h
t
= arg min
h
j
H
j
=
m
i=1
D
t
(i)[y
i
= h
j
(x
i
)]
t = 1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Summary of the Algorithm
Initialization...
For t = 1, ..., T:
Find h
t
= arg min
h
j
H
j
=
m
i=1
D
t
(i)[y
i
= h
j
(x
i
)]
If
t
1/2 then stop
t = 1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Summary of the Algorithm
Initialization...
For t = 1, ..., T:
Find h
t
= arg min
h
j
H
j
=
m
i=1
D
t
(i)[y
i
= h
j
(x
i
)]
If
t
1/2 then stop
Set
t
=
1
2
log(
1+r
t
1r
t
)
t = 1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Summary of the Algorithm
Initialization...
For t = 1, ..., T:
Find h
t
= arg min
h
j
H
j
=
m
i=1
D
t
(i)[y
i
= h
j
(x
i
)]
If
t
1/2 then stop
Set
t
=
1
2
log(
1+r
t
1r
t
)
Update
D
t+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
t = 1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Summary of the Algorithm
Initialization...
For t = 1, ..., T:
Find h
t
= arg min
h
j
H
j
=
m
i=1
D
t
(i)[y
i
= h
j
(x
i
)]
If
t
1/2 then stop
Set
t
=
1
2
log(
1+r
t
1r
t
)
Update
D
t+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
Output the nal classier:
H(x) = sign
t=1
t
h
t
(x)
t = 1
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Summary of the Algorithm
Initialization...
For t = 1, ..., T:
Find h
t
= arg min
h
j
H
j
=
m
i=1
D
t
(i)[y
i
= h
j
(x
i
)]
If
t
1/2 then stop
Set
t
=
1
2
log(
1+r
t
1r
t
)
Update
D
t+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
Output the nal classier:
H(x) = sign
t=1
t
h
t
(x)
t = 2
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Summary of the Algorithm
Initialization...
For t = 1, ..., T:
Find h
t
= arg min
h
j
H
j
=
m
i=1
D
t
(i)[y
i
= h
j
(x
i
)]
If
t
1/2 then stop
Set
t
=
1
2
log(
1+r
t
1r
t
)
Update
D
t+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
Output the nal classier:
H(x) = sign
t=1
t
h
t
(x)
t = 3
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Summary of the Algorithm
Initialization...
For t = 1, ..., T:
Find h
t
= arg min
h
j
H
j
=
m
i=1
D
t
(i)[y
i
= h
j
(x
i
)]
If
t
1/2 then stop
Set
t
=
1
2
log(
1+r
t
1r
t
)
Update
D
t+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
Output the nal classier:
H(x) = sign
t=1
t
h
t
(x)
t = 4
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Summary of the Algorithm
Initialization...
For t = 1, ..., T:
Find h
t
= arg min
h
j
H
j
=
m
i=1
D
t
(i)[y
i
= h
j
(x
i
)]
If
t
1/2 then stop
Set
t
=
1
2
log(
1+r
t
1r
t
)
Update
D
t+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
Output the nal classier:
H(x) = sign
t=1
t
h
t
(x)
t = 5
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Summary of the Algorithm
Initialization...
For t = 1, ..., T:
Find h
t
= arg min
h
j
H
j
=
m
i=1
D
t
(i)[y
i
= h
j
(x
i
)]
If
t
1/2 then stop
Set
t
=
1
2
log(
1+r
t
1r
t
)
Update
D
t+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
Output the nal classier:
H(x) = sign
t=1
t
h
t
(x)
t = 6
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Summary of the Algorithm
Initialization...
For t = 1, ..., T:
Find h
t
= arg min
h
j
H
j
=
m
i=1
D
t
(i)[y
i
= h
j
(x
i
)]
If
t
1/2 then stop
Set
t
=
1
2
log(
1+r
t
1r
t
)
Update
D
t+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
Output the nal classier:
H(x) = sign
t=1
t
h
t
(x)
t = 7
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Summary of the Algorithm
Initialization...
For t = 1, ..., T:
Find h
t
= arg min
h
j
H
j
=
m
i=1
D
t
(i)[y
i
= h
j
(x
i
)]
If
t
1/2 then stop
Set
t
=
1
2
log(
1+r
t
1r
t
)
Update
D
t+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
Output the nal classier:
H(x) = sign
t=1
t
h
t
(x)
t = 40
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Does AdaBoost generalize?
Margins in SVM
max min
(x,y)S
y(
h(x))
2
Margins in AdaBoost
max min
(x,y)S
y(
h(x))
1
Maximizing margins in AdaBoost
P
S
[yf(x) ] 2
T
T
t=1
1
t
(1
t
)
1+
where f(x) =
h(x)
1
Upper bounds based on margin
P
D
[yf(x) 0] P
S
[yf(x) ] +O
d log
2
(m/d)
2
+ log(1/)
1/2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
AdaBoost variants
Freund & Schapire 1995
It can be easily shown, that the totally corrective step reduces the upper
bound on the empirical error without increasing classier complexity.
The TCA was rst proposed by Kivinen and Warmuth, but their
t
is set as in
stadard Adaboost.
Discrete AdaBoost, Real AdaBoost, and Discrete and Real TCA evaluated
Initial test show that the TCA outperforms AB on some standard data sets.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
2 1.5 1 0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
yf(x)
e
r
r
1 2 3 4
e
0.5
t
2 1.5 1 0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
yf(x)
e
r
r
2 1.5 1 0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
yf(x)
e
r
r
1 2 3 4
e
0.5
t
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
10
0
10
1
10
2
10
3
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
Length of the strong classifier
IMAGE
10
0
10
1
10
2
10
3
0.32
0.34
0.36
0.38
0.4
0.42
0.44
0.46
Length of the strong classifier
FLARE
10
0
10
1
10
2
10
3
0.16
0.18
0.2
0.22
0.24
0.26
0.28
0.3
0.32
Length of the strong classifier
GERMAN
10
0
10
1
10
2
10
3
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
Length of the strong classifier
RINGNORM
10
0
10
1
10
2
10
3
0
0.05
0.1
0.15
0.2
0.25
Length of the strong classifier
SPLICE
10
0
10
1
10
2
10
3
0
0.05
0.1
0.15
0.2
0.25
Length of the strong classifier
THYROID
10
0
10
1
10
2
10
3
0.21
0.215
0.22
0.225
0.23
0.235
0.24
Length of the strong classifier
TITANIC
10
0
10
1
10
2
10
3
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
Length of the strong classifier
BANANA
10
0
10
1
10
2
10
3
0.24
0.25
0.26
0.27
0.28
0.29
0.3
0.31
Length of the strong classifier
BREAST
10
0
10
1
10
2
10
3
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Length of the strong classifier
DIABETIS
10
0
10
1
10
2
10
3
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Length of the strong classifier
HEART
10
0
10
1
10
2
10
3
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
Length of the strong classifier
IMAGE
10
0
10
1
10
2
10
3
0.32
0.34
0.36
0.38
0.4
0.42
0.44
0.46
Length of the strong classifier
FLARE
10
0
10
1
10
2
10
3
0.16
0.18
0.2
0.22
0.24
0.26
0.28
0.3
0.32
Length of the strong classifier
GERMAN
10
0
10
1
10
2
10
3
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
Length of the strong classifier
RINGNORM
10
0
10
1
10
2
10
3
0
0.05
0.1
0.15
0.2
0.25
Length of the strong classifier
SPLICE
10
0
10
1
10
2
10
3
0
0.05
0.1
0.15
0.2
0.25
Length of the strong classifier
THYROID
10
0
10
1
10
2
10
3
0.21
0.215
0.22
0.225
0.23
0.235
0.24
Length of the strong classifier
TITANIC
10
0
10
1
10
2
10
3
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
Length of the strong classifier
BANANA
10
0
10
1
10
2
10
3
0.24
0.25
0.26
0.27
0.28
0.29
0.3
0.31
Length of the strong classifier
BREAST
10
0
10
1
10
2
10
3
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Length of the strong classifier
DIABETIS
10
0
10
1
10
2
10
3
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Length of the strong classifier
HEART