Sei sulla pagina 1di 136

AdaBoost

Jiri Matas and Jan



Sochman
Centre for Machine Perception
Czech Technical University, Prague
http://cmp.felk.cvut.cz
Presentation
Outline:

AdaBoost algorithm
Why is of interest?
How it works?
Why it works?

AdaBoost variants

AdaBoost with a Totally Corrective Step (TCS)

Experiments with a Totally Corrective Step


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Introduction

1990 Boost-by-majority algorithm (Freund)

1995 AdaBoost (Freund & Schapire)

1997 Generalized version of AdaBoost (Schapire & Singer)

2001 AdaBoost in Face Detection (Viola & Jones)


Interesting properties:

AB is a linear classier with all its desirable properties.

AB output converges to the logarithm of likelihood ratio.

AB has good generalization properties.

AB is a feature selector with a principled strategy (minimisation of upper


bound on empirical error).

AB close to sequential decision making (it produces a sequence of gradually


more complex classiers).
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
What is AdaBoost?

AdaBoost is an algorithm for constructing a strong classier as linear


combination
f(x) =
T

t=1

t
h
t
(x)
of simple weak classiers h
t
(x).
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
What is AdaBoost?

AdaBoost is an algorithm for constructing a strong classier as linear


combination
f(x) =
T

t=1

t
h
t
(x)
of simple weak classiers h
t
(x).
Terminology

h
t
(x) ... weak or basis classier, hypothesis, feature

H(x) = sign(f(x)) ... strong or nal classier/hypothesis


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
What is AdaBoost?

AdaBoost is an algorithm for constructing a strong classier as linear


combination
f(x) =
T

t=1

t
h
t
(x)
of simple weak classiers h
t
(x).
Terminology

h
t
(x) ... weak or basis classier, hypothesis, feature

H(x) = sign(f(x)) ... strong or nal classier/hypothesis


Comments

The h
t
(x)s can be thought of as features.

Often (typically) the set H = {h(x)} is innite.


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
(Discrete) AdaBoost Algorithm Singer & Schapire (1997)
Given: (x
1
, y
1
), ..., (x
m
, y
m
); x
i
X, y
i
{1, 1}
Initialize weights D
1
(i) = 1/m
For t = 1, ..., T:
1. (Call WeakLearn), which returns the weak classier h
t
: X {1, 1} with
minimum error w.r.t. distribution D
t
;
2. Choose
t
R,
3. Update
D
t+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
where Z
t
is a normalization factor chosen so that D
t+1
is a distribution
Output the strong classier:
H(x) = sign

t=1

t
h
t
(x)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
(Discrete) AdaBoost Algorithm Singer & Schapire (1997)
Given: (x
1
, y
1
), ..., (x
m
, y
m
); x
i
X, y
i
{1, 1}
Initialize weights D
1
(i) = 1/m
For t = 1, ..., T:
1. (Call WeakLearn), which returns the weak classier h
t
: X {1, 1} with
minimum error w.r.t. distribution D
t
;
2. Choose
t
R,
3. Update
D
t+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
where Z
t
is a normalization factor chosen so that D
t+1
is a distribution
Output the strong classier:
H(x) = sign

t=1

t
h
t
(x)

Comments

The computational complexity of selecting h


t
is independent of t.

All information about previously selected features is captured in D


t
!
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
WeakLearn
Loop step: Call WeakLearn, given distribution D
t
;
returns weak classier h
t
: X {1, 1} from H = {h(x)}

Select a weak classier with the smallest weighted error


h
t
= arg min
h
j
H

j
=

m
i=1
D
t
(i)[y
i
= h
j
(x
i
)]

Prerequisite:
t
< 1/2 (otherwise stop)

WeakLearn examples:
Decision tree builder, perceptron learning rule H innite
Selecting the best one from given nite set H
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
WeakLearn
Loop step: Call WeakLearn, given distribution D
t
;
returns weak classier h
t
: X {1, 1} from H = {h(x)}

Select a weak classier with the smallest weighted error


h
t
= arg min
h
j
H

j
=

m
i=1
D
t
(i)[y
i
= h
j
(x
i
)]

Prerequisite:
t
< 1/2 (otherwise stop)

WeakLearn examples:
Decision tree builder, perceptron learning rule H innite
Selecting the best one from given nite set H
Demonstration example
Weak classier = perceptron
N(0, 1)
1
r

8
3
e
1/2(r4)
2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
WeakLearn
Loop step: Call WeakLearn, given distribution D
t
;
returns weak classier h
t
: X {1, 1} from H = {h(x)}

Select a weak classier with the smallest weighted error


h
t
= arg min
h
j
H

j
=

m
i=1
D
t
(i)[y
i
= h
j
(x
i
)]

Prerequisite:
t
< 1/2 (otherwise stop)

WeakLearn examples:
Decision tree builder, perceptron learning rule H innite
Selecting the best one from given nite set H
Demonstration example
Training set
Weak classier = perceptron
N(0, 1)
1
r

8
3
e
1/2(r4)
2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
WeakLearn
Loop step: Call WeakLearn, given distribution D
t
;
returns weak classier h
t
: X {1, 1} from H = {h(x)}

Select a weak classier with the smallest weighted error


h
t
= arg min
h
j
H

j
=

m
i=1
D
t
(i)[y
i
= h
j
(x
i
)]

Prerequisite:
t
< 1/2 (otherwise stop)

WeakLearn examples:
Decision tree builder, perceptron learning rule H innite
Selecting the best one from given nite set H
Demonstration example
Training set
Weak classier = perceptron
N(0, 1)
1
r

8
3
e
1/2(r4)
2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
AdaBoost as a Minimiser of an Upper Bound on the Empirical Error

The main objective is to minimize


tr
=
1
m
|{i : H(x
i
) = y
i
}|

It can be upper bounded by


tr
(H)
T

t=1
Z
t
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
AdaBoost as a Minimiser of an Upper Bound on the Empirical Error

The main objective is to minimize


tr
=
1
m
|{i : H(x
i
) = y
i
}|

It can be upper bounded by


tr
(H)
T

t=1
Z
t
How to set
t
?

Select
t
to greedily minimize Z
t
() in each step

Z
t
() is convex differentiable function with one extremum
h
t
(x) {1, 1} then optimal
t
=
1
2
log(
1+r
t
1r
t
)
where r
t
=

m
i=1
D
t
(i)h
t
(x
i
)y
i

Z
t
= 2

t
(1
t
) 1 for optimal
t
Justication of selection of h
t
according to
t
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
AdaBoost as a Minimiser of an Upper Bound on the Empirical Error

The main objective is to minimize


tr
=
1
m
|{i : H(x
i
) = y
i
}|

It can be upper bounded by


tr
(H)
T

t=1
Z
t
How to set
t
?

Select
t
to greedily minimize Z
t
() in each step

Z
t
() is convex differentiable function with one extremum
h
t
(x) {1, 1} then optimal
t
=
1
2
log(
1+r
t
1r
t
)
where r
t
=

m
i=1
D
t
(i)h
t
(x
i
)y
i

Z
t
= 2

t
(1
t
) 1 for optimal
t
Justication of selection of h
t
according to
t
Comments

The process of selecting


t
and h
t
(x) can be interpreted as a single
optimization step minimising the upper bound on the empirical error.
Improvement of the bound is guaranteed, provided that
t
< 1/2.

The process can be interpreted as a component-wise local optimization


(Gauss-Southwell iteration) in the (possibly innite dimensional!) space of
= (
1
,
2
, . . . ) starting from.
0
= (0, 0, . . . ).
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Reweighting
Effect on the training set
Reweighting formula:
D
t+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
=
exp(y
i

t
q=1

q
h
q
(x
i
))
m

t
q=1
Z
q
exp(
t
y
i
h
t
(x
i
))

< 1, y
i
= h
t
(x
i
)
> 1, y
i
= h
t
(x
i
)
}
Increase (decrease) weight of wrongly (correctly)
classied examples. The weight is the upper bound on the
error of a given example!
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Reweighting
Effect on the training set
Reweighting formula:
D
t+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
=
exp(y
i

t
q=1

q
h
q
(x
i
))
m

t
q=1
Z
q
exp(
t
y
i
h
t
(x
i
))

< 1, y
i
= h
t
(x
i
)
> 1, y
i
= h
t
(x
i
)
}
Increase (decrease) weight of wrongly (correctly)
classied examples. The weight is the upper bound on the
error of a given example!
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Reweighting
Effect on the training set
Reweighting formula:
D
t+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
=
exp(y
i

t
q=1

q
h
q
(x
i
))
m

t
q=1
Z
q
exp(
t
y
i
h
t
(x
i
))

< 1, y
i
= h
t
(x
i
)
> 1, y
i
= h
t
(x
i
)
}
Increase (decrease) weight of wrongly (correctly)
classied examples. The weight is the upper bound on the
error of a given example!
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Reweighting
Effect on the training set
Reweighting formula:
D
t+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
=
exp(y
i

t
q=1

q
h
q
(x
i
))
m

t
q=1
Z
q
exp(
t
y
i
h
t
(x
i
))

< 1, y
i
= h
t
(x
i
)
> 1, y
i
= h
t
(x
i
)
}
Increase (decrease) weight of wrongly (correctly)
classied examples. The weight is the upper bound on the
error of a given example!
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Reweighting
Effect on the training set
Reweighting formula:
D
t+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
=
exp(y
i

t
q=1

q
h
q
(x
i
))
m

t
q=1
Z
q
exp(
t
y
i
h
t
(x
i
))

< 1, y
i
= h
t
(x
i
)
> 1, y
i
= h
t
(x
i
)
}
Increase (decrease) weight of wrongly (correctly)
classied examples. The weight is the upper bound on the
error of a given example!
2 1.5 1 0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
yf(x)
e
rr
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Reweighting
Effect on the training set
Reweighting formula:
D
t+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
=
exp(y
i

t
q=1

q
h
q
(x
i
))
m

t
q=1
Z
q
exp(
t
y
i
h
t
(x
i
))

< 1, y
i
= h
t
(x
i
)
> 1, y
i
= h
t
(x
i
)
}
Increase (decrease) weight of wrongly (correctly)
classied examples. The weight is the upper bound on the
error of a given example!
2 1.5 1 0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
yf(x)
e
rr
Effect on h
t

t
minimize Z
t

i:h
t
(x
i
)=y
i
D
t+1
(i) =

i:h
t
(x
i
)=y
i
D
t+1
(i)

Error of h
t
on D
t+1
is 1/2

Next weak classier is the most


independent one
1 2 3 4
e
0.5
t
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Summary of the Algorithm
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Summary of the Algorithm
Initialization...
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Summary of the Algorithm
Initialization...
For t = 1, ..., T:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Summary of the Algorithm
Initialization...
For t = 1, ..., T:

Find h
t
= arg min
h
j
H

j
=
m

i=1
D
t
(i)[y
i
= h
j
(x
i
)]
t = 1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Summary of the Algorithm
Initialization...
For t = 1, ..., T:

Find h
t
= arg min
h
j
H

j
=
m

i=1
D
t
(i)[y
i
= h
j
(x
i
)]

If
t
1/2 then stop
t = 1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Summary of the Algorithm
Initialization...
For t = 1, ..., T:

Find h
t
= arg min
h
j
H

j
=
m

i=1
D
t
(i)[y
i
= h
j
(x
i
)]

If
t
1/2 then stop

Set
t
=
1
2
log(
1+r
t
1r
t
)
t = 1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Summary of the Algorithm
Initialization...
For t = 1, ..., T:

Find h
t
= arg min
h
j
H

j
=
m

i=1
D
t
(i)[y
i
= h
j
(x
i
)]

If
t
1/2 then stop

Set
t
=
1
2
log(
1+r
t
1r
t
)

Update
D
t+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
t = 1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Summary of the Algorithm
Initialization...
For t = 1, ..., T:

Find h
t
= arg min
h
j
H

j
=
m

i=1
D
t
(i)[y
i
= h
j
(x
i
)]

If
t
1/2 then stop

Set
t
=
1
2
log(
1+r
t
1r
t
)

Update
D
t+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
Output the nal classier:
H(x) = sign

t=1

t
h
t
(x)

t = 1
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Summary of the Algorithm
Initialization...
For t = 1, ..., T:

Find h
t
= arg min
h
j
H

j
=
m

i=1
D
t
(i)[y
i
= h
j
(x
i
)]

If
t
1/2 then stop

Set
t
=
1
2
log(
1+r
t
1r
t
)

Update
D
t+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
Output the nal classier:
H(x) = sign

t=1

t
h
t
(x)

t = 2
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Summary of the Algorithm
Initialization...
For t = 1, ..., T:

Find h
t
= arg min
h
j
H

j
=
m

i=1
D
t
(i)[y
i
= h
j
(x
i
)]

If
t
1/2 then stop

Set
t
=
1
2
log(
1+r
t
1r
t
)

Update
D
t+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
Output the nal classier:
H(x) = sign

t=1

t
h
t
(x)

t = 3
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Summary of the Algorithm
Initialization...
For t = 1, ..., T:

Find h
t
= arg min
h
j
H

j
=
m

i=1
D
t
(i)[y
i
= h
j
(x
i
)]

If
t
1/2 then stop

Set
t
=
1
2
log(
1+r
t
1r
t
)

Update
D
t+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
Output the nal classier:
H(x) = sign

t=1

t
h
t
(x)

t = 4
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Summary of the Algorithm
Initialization...
For t = 1, ..., T:

Find h
t
= arg min
h
j
H

j
=
m

i=1
D
t
(i)[y
i
= h
j
(x
i
)]

If
t
1/2 then stop

Set
t
=
1
2
log(
1+r
t
1r
t
)

Update
D
t+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
Output the nal classier:
H(x) = sign

t=1

t
h
t
(x)

t = 5
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Summary of the Algorithm
Initialization...
For t = 1, ..., T:

Find h
t
= arg min
h
j
H

j
=
m

i=1
D
t
(i)[y
i
= h
j
(x
i
)]

If
t
1/2 then stop

Set
t
=
1
2
log(
1+r
t
1r
t
)

Update
D
t+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
Output the nal classier:
H(x) = sign

t=1

t
h
t
(x)

t = 6
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Summary of the Algorithm
Initialization...
For t = 1, ..., T:

Find h
t
= arg min
h
j
H

j
=
m

i=1
D
t
(i)[y
i
= h
j
(x
i
)]

If
t
1/2 then stop

Set
t
=
1
2
log(
1+r
t
1r
t
)

Update
D
t+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
Output the nal classier:
H(x) = sign

t=1

t
h
t
(x)

t = 7
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Summary of the Algorithm
Initialization...
For t = 1, ..., T:

Find h
t
= arg min
h
j
H

j
=
m

i=1
D
t
(i)[y
i
= h
j
(x
i
)]

If
t
1/2 then stop

Set
t
=
1
2
log(
1+r
t
1r
t
)

Update
D
t+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
Output the nal classier:
H(x) = sign

t=1

t
h
t
(x)

t = 40
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Does AdaBoost generalize?
Margins in SVM
max min
(x,y)S
y(

h(x))

2
Margins in AdaBoost
max min
(x,y)S
y(

h(x))

1
Maximizing margins in AdaBoost
P
S
[yf(x) ] 2
T
T

t=1

1
t
(1
t
)
1+
where f(x) =

h(x)

1
Upper bounds based on margin
P
D
[yf(x) 0] P
S
[yf(x) ] +O

d log
2
(m/d)

2
+ log(1/)

1/2

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
AdaBoost variants
Freund & Schapire 1995

Discrete (h : X {0, 1})

Multiclass AdaBoost.M1 (h : X {0, 1, ..., k})

Multiclass AdaBoost.M2 (h : X [0, 1]


k
)

Real valued AdaBoost.R (Y = [0, 1], h : X [0, 1])


Schapire & Singer 1997

Condence rated prediction (h : X R, two-class)

Multilabel AdaBoost.MR, AdaBoost.MH (different formulation of minimized


loss)
... Many other modications since then (Totally Corrective AB, Cascaded AB)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Pros and cons of AdaBoost
Advantages

Very simple to implement

Feature selection on very large sets of features

Fairly good generalization


Disadvantages

Suboptimal solution for

Can overt in presence of noise


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Adaboost with a Totally Corrective Step (TCA)
Given: (x
1
, y
1
), ..., (x
m
, y
m
); x
i
X, y
i
{1, 1}
Initialize weights D
1
(i) = 1/m
For t = 1, ..., T:
1. (Call WeakLearn), which returns the weak classier h
t
: X {1, 1} with
minimum error w.r.t. distribution D
t
;
2. Choose
t
R,
3. Update D
t+1
4. (Call WeakLearn) on the set of h
m
s with non zero s . Update
.
.
Update D
t+1
. Repeat till |
t
1/2| < , t.
Comments

All weak classiers have


t
1/2, therefore the classier selected at t + 1 is
independent of all classiers selected so far.

It can be easily shown, that the totally corrective step reduces the upper
bound on the empirical error without increasing classier complexity.

The TCA was rst proposed by Kivinen and Warmuth, but their
t
is set as in
stadard Adaboost.

Generalization of TCA is an open question.


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Experiments with TCA on the IDA Database

Discrete AdaBoost, Real AdaBoost, and Discrete and Real TCA evaluated

Weak learner: stumps.

Data from the IDA repository (Ratsch:2000):


Input Training Testing Number of
dimension patterns patterns realizations
Banana 2 400 4900 100
Breast cancer 9 200 77 100
Diabetes 8 468 300 100
German 20 700 300 100
Heart 13 170 100 100
Image segment 18 1300 1010 20
Ringnorm 20 400 7000 100
Flare solar 9 666 400 100
Splice 60 1000 2175 20
Thyroid 5 140 75 100
Titanic 3 150 2051 100
Twonorm 20 400 7000 100
Waveform 21 400 4600 100

Note that the training sets are fairly small


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Results with TCA on the IDA Database
Training error (dashed line), test error (solid line)
Discrete AdaBoost (blue), Real AdaBoost (green),
Discrete AdaBoost with TCA (red), Real AdaBoost with TCA (cyan)
the black horizontal line: the error of AdaBoost with RBF network weak classiers from (Ratsch-ML:2000)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Results with TCA on the IDA Database
Training error (dashed line), test error (solid line)
Discrete AdaBoost (blue), Real AdaBoost (green),
Discrete AdaBoost with TCA (red), Real AdaBoost with TCA (cyan)
the black horizontal line: the error of AdaBoost with RBF network weak classiers from (Ratsch-ML:2000)
10
0
10
1
10
2
10
3
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
Length of the strong classifier
IMAGE
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Results with TCA on the IDA Database
Training error (dashed line), test error (solid line)
Discrete AdaBoost (blue), Real AdaBoost (green),
Discrete AdaBoost with TCA (red), Real AdaBoost with TCA (cyan)
the black horizontal line: the error of AdaBoost with RBF network weak classiers from (Ratsch-ML:2000)
10
0
10
1
10
2
10
3
0.32
0.34
0.36
0.38
0.4
0.42
0.44
0.46
Length of the strong classifier
FLARE
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Results with TCA on the IDA Database
Training error (dashed line), test error (solid line)
Discrete AdaBoost (blue), Real AdaBoost (green),
Discrete AdaBoost with TCA (red), Real AdaBoost with TCA (cyan)
the black horizontal line: the error of AdaBoost with RBF network weak classiers from (Ratsch-ML:2000)
10
0
10
1
10
2
10
3
0.16
0.18
0.2
0.22
0.24
0.26
0.28
0.3
0.32
Length of the strong classifier
GERMAN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Results with TCA on the IDA Database
Training error (dashed line), test error (solid line)
Discrete AdaBoost (blue), Real AdaBoost (green),
Discrete AdaBoost with TCA (red), Real AdaBoost with TCA (cyan)
the black horizontal line: the error of AdaBoost with RBF network weak classiers from (Ratsch-ML:2000)
10
0
10
1
10
2
10
3
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
Length of the strong classifier
RINGNORM
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Results with TCA on the IDA Database
Training error (dashed line), test error (solid line)
Discrete AdaBoost (blue), Real AdaBoost (green),
Discrete AdaBoost with TCA (red), Real AdaBoost with TCA (cyan)
the black horizontal line: the error of AdaBoost with RBF network weak classiers from (Ratsch-ML:2000)
10
0
10
1
10
2
10
3
0
0.05
0.1
0.15
0.2
0.25
Length of the strong classifier
SPLICE
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Results with TCA on the IDA Database
Training error (dashed line), test error (solid line)
Discrete AdaBoost (blue), Real AdaBoost (green),
Discrete AdaBoost with TCA (red), Real AdaBoost with TCA (cyan)
the black horizontal line: the error of AdaBoost with RBF network weak classiers from (Ratsch-ML:2000)
10
0
10
1
10
2
10
3
0
0.05
0.1
0.15
0.2
0.25
Length of the strong classifier
THYROID
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Results with TCA on the IDA Database
Training error (dashed line), test error (solid line)
Discrete AdaBoost (blue), Real AdaBoost (green),
Discrete AdaBoost with TCA (red), Real AdaBoost with TCA (cyan)
the black horizontal line: the error of AdaBoost with RBF network weak classiers from (Ratsch-ML:2000)
10
0
10
1
10
2
10
3
0.21
0.215
0.22
0.225
0.23
0.235
0.24
Length of the strong classifier
TITANIC
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Results with TCA on the IDA Database
Training error (dashed line), test error (solid line)
Discrete AdaBoost (blue), Real AdaBoost (green),
Discrete AdaBoost with TCA (red), Real AdaBoost with TCA (cyan)
the black horizontal line: the error of AdaBoost with RBF network weak classiers from (Ratsch-ML:2000)
10
0
10
1
10
2
10
3
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
Length of the strong classifier
BANANA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Results with TCA on the IDA Database
Training error (dashed line), test error (solid line)
Discrete AdaBoost (blue), Real AdaBoost (green),
Discrete AdaBoost with TCA (red), Real AdaBoost with TCA (cyan)
the black horizontal line: the error of AdaBoost with RBF network weak classiers from (Ratsch-ML:2000)
10
0
10
1
10
2
10
3
0.24
0.25
0.26
0.27
0.28
0.29
0.3
0.31
Length of the strong classifier
BREAST
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Results with TCA on the IDA Database
Training error (dashed line), test error (solid line)
Discrete AdaBoost (blue), Real AdaBoost (green),
Discrete AdaBoost with TCA (red), Real AdaBoost with TCA (cyan)
the black horizontal line: the error of AdaBoost with RBF network weak classiers from (Ratsch-ML:2000)
10
0
10
1
10
2
10
3
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Length of the strong classifier
DIABETIS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Results with TCA on the IDA Database
Training error (dashed line), test error (solid line)
Discrete AdaBoost (blue), Real AdaBoost (green),
Discrete AdaBoost with TCA (red), Real AdaBoost with TCA (cyan)
the black horizontal line: the error of AdaBoost with RBF network weak classiers from (Ratsch-ML:2000)
10
0
10
1
10
2
10
3
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Length of the strong classifier
HEART
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Conclusions

The AdaBoost algorithm was presented and analysed

A modication of the Totally Corrective AdaBoost was introduced

Initial test show that the TCA outperforms AB on some standard data sets.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
2 1.5 1 0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
yf(x)
e
r
r
1 2 3 4
e
0.5
t
2 1.5 1 0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
yf(x)
e
r
r
2 1.5 1 0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
yf(x)
e
r
r
1 2 3 4
e
0.5
t
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
10
0
10
1
10
2
10
3
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
Length of the strong classifier
IMAGE
10
0
10
1
10
2
10
3
0.32
0.34
0.36
0.38
0.4
0.42
0.44
0.46
Length of the strong classifier
FLARE
10
0
10
1
10
2
10
3
0.16
0.18
0.2
0.22
0.24
0.26
0.28
0.3
0.32
Length of the strong classifier
GERMAN
10
0
10
1
10
2
10
3
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
Length of the strong classifier
RINGNORM
10
0
10
1
10
2
10
3
0
0.05
0.1
0.15
0.2
0.25
Length of the strong classifier
SPLICE
10
0
10
1
10
2
10
3
0
0.05
0.1
0.15
0.2
0.25
Length of the strong classifier
THYROID
10
0
10
1
10
2
10
3
0.21
0.215
0.22
0.225
0.23
0.235
0.24
Length of the strong classifier
TITANIC
10
0
10
1
10
2
10
3
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
Length of the strong classifier
BANANA
10
0
10
1
10
2
10
3
0.24
0.25
0.26
0.27
0.28
0.29
0.3
0.31
Length of the strong classifier
BREAST
10
0
10
1
10
2
10
3
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Length of the strong classifier
DIABETIS
10
0
10
1
10
2
10
3
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Length of the strong classifier
HEART
10
0
10
1
10
2
10
3
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
Length of the strong classifier
IMAGE
10
0
10
1
10
2
10
3
0.32
0.34
0.36
0.38
0.4
0.42
0.44
0.46
Length of the strong classifier
FLARE
10
0
10
1
10
2
10
3
0.16
0.18
0.2
0.22
0.24
0.26
0.28
0.3
0.32
Length of the strong classifier
GERMAN
10
0
10
1
10
2
10
3
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
Length of the strong classifier
RINGNORM
10
0
10
1
10
2
10
3
0
0.05
0.1
0.15
0.2
0.25
Length of the strong classifier
SPLICE
10
0
10
1
10
2
10
3
0
0.05
0.1
0.15
0.2
0.25
Length of the strong classifier
THYROID
10
0
10
1
10
2
10
3
0.21
0.215
0.22
0.225
0.23
0.235
0.24
Length of the strong classifier
TITANIC
10
0
10
1
10
2
10
3
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
Length of the strong classifier
BANANA
10
0
10
1
10
2
10
3
0.24
0.25
0.26
0.27
0.28
0.29
0.3
0.31
Length of the strong classifier
BREAST
10
0
10
1
10
2
10
3
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Length of the strong classifier
DIABETIS
10
0
10
1
10
2
10
3
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Length of the strong classifier
HEART

Potrebbero piacerti anche