Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Perceptron
Given
Data 𝐷 = {𝑥𝑖 , 𝑦𝑖 }𝑛𝑖=1 𝑥𝑖 ∈ ℝ𝑑 & 𝑥𝑖 2 = 1; 𝑦𝑖 = ±1
𝑥 𝑤
𝑥= 𝑤=
1 𝑏
• The algorithm proceeds in Multiple iterations.
𝑥𝑖
• If I scale a point 𝑥𝑖 = , then
𝑥𝑖 2
𝑓 𝑥𝑖 does not change
𝑥𝑖
𝑓 𝑥𝑖 = 𝑠𝑔𝑛(< 𝑤, >)
𝑥𝑖 2
= 𝑠𝑔𝑛(< 𝑤, 𝑥𝑖 >)
• Assume 𝑥𝑖 s are scaled i.e 𝑥𝑖 2 =1
ALGO:
1. 𝑤 = 0 //Initialize
2. for 𝑡 = 1 𝑡𝑜 𝑛 (till convergence) //w - current classifier
if < 𝑤, 𝑥𝑡 > 𝑦𝑡 ≤ 0
𝑤 = 𝑤 + 𝑦𝑡 𝑥𝑡 //Do an update
end
end
3. Output 𝑤
1
From 1 & 2 we will conclude that number of mistakes will be ≤ 2 .
𝛾
• < 𝑤𝑡 , 𝑤 ∗ >=< 𝑤𝑡−1 + 𝑥𝑡 𝑦𝑡 , 𝑤 ∗ >
=< 𝑤𝑡−1 , 𝑤 ∗ > +𝑦𝑡 < 𝑥𝑡 , 𝑤 ∗ > //𝑦𝑡 < 𝑥𝑡 , 𝑤 ∗ > ≥ 𝛾
≥< 𝑤𝑡−1 , 𝑤 ∗ > +𝛾 //(𝛾 = min 𝑦𝑖 < 𝑤 ∗ , 𝑥𝑖 >)
𝑖
≥ 𝑀𝛾
• So,
𝑤𝑡 2 ≤ 𝑤𝑡−1 2 + 1 ≤ 𝑀
I made at most 𝑀 mistakes & my initial 𝑤0 was zero.
• So,
𝑤𝑡 2 ≤ 𝑀 or 𝑤𝑡 ≤ 𝑀
2
• By Cauchy Schwartz Inequality,
𝑥
𝑀𝛾 ≤ < 𝑤𝑡 , 𝑤 ∗ > ≤ 𝑤𝑡+ 𝑤 ∗ 2𝑥 ≤ 𝑀
2
1 1
=> 𝑀 ≤ => 𝑀 ≤ 2
𝛾 𝛾
• Cauchy Schwartz
• If there are two vectors 𝑝, 𝑞 ∈ ℝ𝑑
• < 𝑝, 𝑞 > ≤ 𝑝 2 𝑞 2
• i.e. σ𝑖 𝑝𝑖 𝑞𝑖 ≤ σ 𝑝𝑖 2 σ 𝑞𝑖 2
• Holder’s Inequality
< 𝑝, 𝑞 > ≤ 𝑝 𝑙1 𝑞 𝑙2
1 1
+ =1
𝑙1 𝑙2
𝑊ℎ𝑒𝑛 𝛼𝑡 = 0,
𝑦𝑡 < 𝑤𝑡−1 , 𝑥𝑡 > ≥ 0
When 𝛼𝑡 = 1, then
𝑓 𝑥 = 𝑠𝑔𝑛 σ𝑛𝑖=1 𝛼𝑡 𝑦𝑡 < 𝑥𝑡 , 𝑥 >