Sei sulla pagina 1di 12

PERCEPTRON

Perceptron

Given
Data 𝐷 = {𝑥𝑖 , 𝑦𝑖 }𝑛𝑖=1 𝑥𝑖 ∈ ℝ𝑑 & 𝑥𝑖 2 = 1; 𝑦𝑖 = ±1

Learn predictive function (Hyperplane)


• 𝑓 𝑥 = 𝑠𝑔𝑛 < 𝑤, 𝑥 + 𝑏 >
= 𝑠𝑔𝑛(< 𝑤, 𝑥 >)

𝑥 𝑤
𝑥= 𝑤=
1 𝑏
• The algorithm proceeds in Multiple iterations.
𝑥𝑖
• If I scale a point 𝑥𝑖 = , then
𝑥𝑖 2
𝑓 𝑥𝑖 does not change
𝑥𝑖
𝑓 𝑥𝑖 = 𝑠𝑔𝑛(< 𝑤, >)
𝑥𝑖 2
= 𝑠𝑔𝑛(< 𝑤, 𝑥𝑖 >)
• Assume 𝑥𝑖 s are scaled i.e 𝑥𝑖 2 =1
ALGO:
1. 𝑤 = 0 //Initialize
2. for 𝑡 = 1 𝑡𝑜 𝑛 (till convergence) //w - current classifier
if < 𝑤, 𝑥𝑡 > 𝑦𝑡 ≤ 0
𝑤 = 𝑤 + 𝑦𝑡 𝑥𝑡 //Do an update
end
end
3. Output 𝑤

If in step 2 < 𝑤, 𝑥𝑡 > 𝑦𝑡 > 0. I am doing good hence I do not change my


classifier else I will change my classifier.
• Add that point with the classification.
i.e. 𝑤 = 𝑤 + 𝑥𝑡 𝑦𝑡
• We assume that points are linearly separable then perceptron will be
able to find a hyperplane 𝑤 ∗ .
∃𝑤 ∗ 𝑠. 𝑡. 𝑤 ∗ 2 = 1
𝛾 = min 𝑥𝑖 , 𝑤 ∗ 𝑦𝑖
𝑖

• If 𝛾 is large, finding hyperplane 𝑤 ∗ is easy.


• If 𝛾 is narrow, finding hyperplane 𝑤 ∗ is hard.
• CLAIM- Perceptron Algorithm will find 𝑤 ∗ [linear separator] after at most
1
𝛾 2 mistakes.

• MISTAKE- When 𝑤 misclassifies 𝑥𝑡 𝑦𝑡 .

• 𝒘∗ - May not be unique.

• (If 𝛾 is exponentially small i.e.,𝑒 −𝑑 , then I will require exponential number


of training points in any dimension)-
𝑅𝑢𝑛𝑛𝑖𝑛𝑔 𝑡𝑖𝑚𝑒 𝑜𝑓 𝑎𝑙𝑔𝑜. +𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑝𝑜𝑖𝑛𝑡𝑠 𝑟𝑒𝑞𝑢𝑖𝑟𝑒𝑑 - large
1
• Theorem: Perceptron algorithm will find 𝑤∗ after at most mistakes.
𝛾2

• Proof: Let us say that in 𝑡 step, there is a mistake


𝑤𝑡 = 𝑤𝑡 + 𝑥𝑡 𝑦𝑡
Till 𝑡 𝑡ℎ step, Algo made 𝑀 mistakes 𝑀 ≤ 𝑡.
1.< 𝑤𝑡 , 𝑤 ∗ > ≥ 𝑀𝛾 (large enough)
2. 𝑤𝑡 2 ≤ 𝑀

1
From 1 & 2 we will conclude that number of mistakes will be ≤ 2 .
𝛾
• < 𝑤𝑡 , 𝑤 ∗ >=< 𝑤𝑡−1 + 𝑥𝑡 𝑦𝑡 , 𝑤 ∗ >
=< 𝑤𝑡−1 , 𝑤 ∗ > +𝑦𝑡 < 𝑥𝑡 , 𝑤 ∗ > //𝑦𝑡 < 𝑥𝑡 , 𝑤 ∗ > ≥ 𝛾
≥< 𝑤𝑡−1 , 𝑤 ∗ > +𝛾 //(𝛾 = min 𝑦𝑖 < 𝑤 ∗ , 𝑥𝑖 >)
𝑖
≥ 𝑀𝛾

• (In the first step 𝑤0 = 0)—Assumption.


𝑤𝑡 2 = 𝑤𝑡−1 + 𝑥𝑡 𝑦𝑡 2
= 𝑤𝑡−1 2 +𝑦𝑡 2 𝑥𝑡 2 + 2 < 𝑤𝑡−1 , 𝑥𝑡 > 𝑦𝑡
• For 𝑦𝑡 2 𝑥𝑡 2 term
𝑦𝑡 = ±1 => 𝑦𝑡 2 = 1
& 𝑥𝑡 − 𝑁𝑜𝑟𝑚𝑎𝑙𝑖𝑧𝑒𝑑 => 𝑥𝑡 2 =1
• For 2 < 𝑤𝑡−1 , 𝑥𝑡 > 𝑦𝑡 term
So, I made a mistake in the 𝑡 𝑡ℎ step.
So, 2 < 𝑤𝑡−1 , 𝑥𝑡 > 𝑦𝑡 ≤ 0

• So,
𝑤𝑡 2 ≤ 𝑤𝑡−1 2 + 1 ≤ 𝑀
I made at most 𝑀 mistakes & my initial 𝑤0 was zero.
• So,
𝑤𝑡 2 ≤ 𝑀 or 𝑤𝑡 ≤ 𝑀
2
• By Cauchy Schwartz Inequality,

𝑥
𝑀𝛾 ≤ < 𝑤𝑡 , 𝑤 ∗ > ≤ 𝑤𝑡+ 𝑤 ∗ 2𝑥 ≤ 𝑀
2
1 1
=> 𝑀 ≤ => 𝑀 ≤ 2
𝛾 𝛾

• Cauchy Schwartz
• If there are two vectors 𝑝, 𝑞 ∈ ℝ𝑑
• < 𝑝, 𝑞 > ≤ 𝑝 2 𝑞 2
• i.e. σ𝑖 𝑝𝑖 𝑞𝑖 ≤ σ 𝑝𝑖 2 σ 𝑞𝑖 2
• Holder’s Inequality

< 𝑝, 𝑞 > ≤ 𝑝 𝑙1 𝑞 𝑙2

1 1
+ =1
𝑙1 𝑙2

• Conjugate Norms are those that satisfy Holder’s Inequality.


𝑓 𝑥 = 𝑠𝑔𝑛 < 𝑤, 𝑥 >
𝑤 = σ𝑛𝑖=1 𝛼𝑡 𝑦𝑡 𝑥𝑡

𝑊ℎ𝑒𝑛 𝛼𝑡 = 0,
𝑦𝑡 < 𝑤𝑡−1 , 𝑥𝑡 > ≥ 0

When 𝛼𝑡 = 1, then
𝑓 𝑥 = 𝑠𝑔𝑛 σ𝑛𝑖=1 𝛼𝑡 𝑦𝑡 < 𝑥𝑡 , 𝑥 >

Potrebbero piacerti anche