Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
|x| 1
_
and
1
2
d. Then for any deterministic algorithm, there exists a
data set which is separable by a margin of on which the algorithm makes at least
1
2
| mistakes.
Proof. Let n =
1
2
|. Note that n d and
2
n 1. Let e
i
be the unit vector with a 1 in the ith coordinate and zeroes
in others. Consider e
1
, . . . , e
n
. We now claim that, for any b 1, +1
n
, there is a w with |w| 1 such that
i [n], b
i
(w
i
e
i
) = .
To see this, simply choose w
i
= b
i
. Then the above equality is true. Moreover, |w|
2
=
2
n
i=1
b
2
i
=
2
n 1.
Now given an algorithm /, dene the data set (x
i
, y
i
)
n
i=1
as follows. Let x
i
= e
i
for all i and y
1
= /(x
1
).
Dene y
i
for i > 1 recursively as
y
i
= /(x
1
, y
1
, . . . , x
i1
, y
i1
, x
i
) .
It is clear that the algorithm makes n mistakes when run on this data set. By the above claim, no matter what y
i
s turn
out to be, the data set is separable by a margin of .
2 The Winnow Algorithm
Algorithm 1 WINNOW
Input parameter: > 0 (learning rate)
w
1
1
d
1
for t = 1 to T do
Receive x
t
R
d
Predict sgn(w
t
x
t
)
Receive y
t
1, +1
if sgn(w
t
x
t
) ,= y
t
then
i [d], w
t+1,i
wt,i exp(ytxt,i)
Zt
where Z
t
=
d
i=1
w
t,i
exp(y
t
x
t,i
)
else
w
t+1
w
t
end if
end for
Theorem 2.1. Suppose Assumption M holds. Further assume that w
0. Let
M
T
:=
T
t=1
1[sgn(w
t
x
t
) ,= y
t
]
1
denote the number of mistakes the WINNOW algorithm makes. Then, for a suitable choice of , we have,
M
T
2|x
1:T
|
2
|w
|
2
1
2
ln d .
Proof. Let u
= w
/|w
|. Since we assume w
0, u
t
:=
d
i=1
u
i
ln
u
i
w
t,i
.
When there is no mistake
t+1
=
t
. On a round when a mistake occurs, we have
t+1
t
=
d
i=1
u
i
ln
w
t,i
w
t+1,i
=
d
i=1
u
i
ln
Z
t
exp(y
t
x
t,i
)
= ln(Z
t
)
d
i=t
u
i
y
t
d
i=1
u
i
x
t,i
= ln(Z
t
) y
t
(u
x
t
)
ln(Z
t
) /|w
|
1
, (1)
where the last inequality follows from the denition of u
. Then y
t
x
t,i
[L, L] for all t, i. Then we can bound
Z
t
=
d
i=1
w
t,i
e
ytxt,i
using the convexity of the function t e
t
on the interval [L, L] as follows.
Z
t
d
i=1
1 + y
t
x
t,i
/L
2
e
L
+
1 y
t
x
t,i
/L
2
e
L
=
e
L
+ e
L
2
d
i=1
w
t,i
+
e
L
e
L
2
_
y
t
d
i=1
w
t,i
x
t,i
_
=
e
L
+ e
L
2
+
e
L
e
L
2
y
t
(w
t
x
t
)
e
L
+ e
L
2
because having a mistake implies y
t
(w
t
x
t
) 0 and e
L
e
L
> 0. So we have proved
ln(Z
t
) ln
_
e
L
+ e
L
2
_
. (2)
Dene
C() := /|w
|
1
ln
_
e
L
+ e
L
2
_
.
2
Combining (1) and (2) then gives us
t+1
t
C()1[y
t
,= sgn(w
t
x
t
)] .
Unwinding the recursion gives,
T+1
1
C()M
T
.
Since relative entropy is always non-negative
T+1
0. Further,
1
=
d
i=1
u
i
ln(du
i
)
d
i=1
u
i
ln d = ln d
which gives us
0 ln d C()M
T
and therefore M
T
ln d
C()
. Setting
=
1
2L
ln
_
L + /|w
|
1
L /|w
|
1
_
to maximize the denominator C() gives
M
T
ln d
g
_
Lw
1
_
where g() :=
1+
2
ln(1 + ) +
1
2
ln(1 ). Finally, noting that g()
2
/2 proves the theorem.
3