Sei sulla pagina 1di 3

Stat 928: Statistical Learning Theory Lecture: 19

Perceptron Lower Bound & The Winnow Algorithm


Instructor: Sham Kakade
1 Lower Bound
Theorem 1.1. Suppose A =
_
x R
d

|x| 1
_
and
1

2
d. Then for any deterministic algorithm, there exists a
data set which is separable by a margin of on which the algorithm makes at least
1

2
| mistakes.
Proof. Let n =
1

2
|. Note that n d and
2
n 1. Let e
i
be the unit vector with a 1 in the ith coordinate and zeroes
in others. Consider e
1
, . . . , e
n
. We now claim that, for any b 1, +1
n
, there is a w with |w| 1 such that
i [n], b
i
(w
i
e
i
) = .
To see this, simply choose w
i
= b
i
. Then the above equality is true. Moreover, |w|
2
=
2

n
i=1
b
2
i
=
2
n 1.
Now given an algorithm /, dene the data set (x
i
, y
i
)
n
i=1
as follows. Let x
i
= e
i
for all i and y
1
= /(x
1
).
Dene y
i
for i > 1 recursively as
y
i
= /(x
1
, y
1
, . . . , x
i1
, y
i1
, x
i
) .
It is clear that the algorithm makes n mistakes when run on this data set. By the above claim, no matter what y
i
s turn
out to be, the data set is separable by a margin of .
2 The Winnow Algorithm
Algorithm 1 WINNOW
Input parameter: > 0 (learning rate)
w
1

1
d
1
for t = 1 to T do
Receive x
t
R
d
Predict sgn(w
t
x
t
)
Receive y
t
1, +1
if sgn(w
t
x
t
) ,= y
t
then
i [d], w
t+1,i

wt,i exp(ytxt,i)
Zt
where Z
t
=

d
i=1
w
t,i
exp(y
t
x
t,i
)
else
w
t+1
w
t
end if
end for
Theorem 2.1. Suppose Assumption M holds. Further assume that w

0. Let
M
T
:=
T

t=1
1[sgn(w
t
x
t
) ,= y
t
]
1
denote the number of mistakes the WINNOW algorithm makes. Then, for a suitable choice of , we have,
M
T

2|x
1:T
|
2

|w

|
2
1

2
ln d .
Proof. Let u

= w

/|w

|. Since we assume w

0, u

is a probability distribution. At all times, the weight


vector w
t
maintained by WINNOW is also a probability distribution. Let us measure the progress of the algorithm by
analyzing the relative entropy between these two distributions at time t. Accordingly, dene

t
:=
d

i=1
u

i
ln
u

i
w
t,i
.
When there is no mistake
t+1
=
t
. On a round when a mistake occurs, we have

t+1

t
=
d

i=1
u

i
ln
w
t,i
w
t+1,i
=
d

i=1
u

i
ln
Z
t
exp(y
t
x
t,i
)
= ln(Z
t
)
d

i=t
u

i
y
t
d

i=1
u

i
x
t,i
= ln(Z
t
) y
t
(u

x
t
)
ln(Z
t
) /|w

|
1
, (1)
where the last inequality follows from the denition of u

and Assumption M. Let L = |x


1:T
|

. Then y
t
x
t,i

[L, L] for all t, i. Then we can bound
Z
t
=
d

i=1
w
t,i
e
ytxt,i
using the convexity of the function t e
t
on the interval [L, L] as follows.
Z
t

d

i=1
1 + y
t
x
t,i
/L
2
e
L
+
1 y
t
x
t,i
/L
2
e
L
=
e
L
+ e
L
2
d

i=1
w
t,i
+
e
L
e
L
2
_
y
t
d

i=1
w
t,i
x
t,i
_
=
e
L
+ e
L
2
+
e
L
e
L
2
y
t
(w
t
x
t
)

e
L
+ e
L
2
because having a mistake implies y
t
(w
t
x
t
) 0 and e
L
e
L
> 0. So we have proved
ln(Z
t
) ln
_
e
L
+ e
L
2
_
. (2)
Dene
C() := /|w

|
1
ln
_
e
L
+ e
L
2
_
.
2
Combining (1) and (2) then gives us

t+1

t
C()1[y
t
,= sgn(w
t
x
t
)] .
Unwinding the recursion gives,

T+1

1
C()M
T
.
Since relative entropy is always non-negative
T+1
0. Further,

1
=
d

i=1
u

i
ln(du

i
)
d

i=1
u

i
ln d = ln d
which gives us
0 ln d C()M
T
and therefore M
T

ln d
C()
. Setting
=
1
2L
ln
_
L + /|w

|
1
L /|w

|
1
_
to maximize the denominator C() gives
M
T

ln d
g
_

Lw

1
_
where g() :=
1+
2
ln(1 + ) +
1
2
ln(1 ). Finally, noting that g()
2
/2 proves the theorem.
3

Potrebbero piacerti anche