SVM PDF

https://www.learnbay.
co/data-science-course/
Classifiers
x f yest
f(x,w,b) = sign(w.
denotes +1 x
denotes -1 - b)
How would you

classify this
data?
https://www.learnbay.co/data-science-course/ Support Vector Machines: Slide 2

Classifiers
x f yest
f(x,w,b) = sign(w.
denotes +1 x
denotes -1 - b)
How would you

classify this
data?
/
Support Vector Machines: Slide 3
Classifiers
x f yest
f(x,w,b) = sign(w.
denotes +1 x
denotes -1 - b)
How would you

classify this
data?

Classifiers
x f yest
f(x,w,b) = sign(w.
denotes +1 x
denotes -1 - b)
How would you

classify this
data?

Classifiers
x f yest
f(x,w,b) = sign(w.
denotes +1 x
denotes -1 - b)
Any of these
would be fine..
..but which
is best?

Margi
x f
n yest
f(x,w,b) = sign(w. x
denotes +1 - b)
denotes -1 Define the margin of
a linear
classifier as
the width that
the boundary
could be
increased by
before hitting a
datapoint.

Margi
x f
n yest
denotes +1 - b)
denotes -1 The maximum
margin linear
classifier is the
linear
classifier with
the, um,
maximum
margin.
This isimthpelest kind of
SVM (Called
Linear SVM an LSVM)
Margi
x f
n yest
denotes +1 - b)
denotes -1 The maximum
margin linear
classifier is the
linear
Support
Vectors are
classifier with
those the, um,
datapoints maximum
that the margin.
margin pushes
up against This isimthpelest kind of
SVM (Called an
Linear SVM LSVM)
Why Maximum Margin?
1. Intuitively this feels safest.
f(x,w,b)
2. If we’ve made = sign(w.
a small x
error in
denotes +1
the location of the-boundary
b) (it’s
denotes -1 The
been jolted in maximum
its perpendicular
direction) this gives
marginus least
linear
chance of causing a
classifier is the
misclassification.
linear
Support 3. Easy since the classifier
model is immune
Vectors are
with
to removal of any non- support-
those the, um,
vector datapoints.
datapoints maximum
that the 4. There’s some theory (using VC
margin.
margin pushes dimension) that is related to
up against (but notThis
the same
is theas)simplest
the
proposition that this of
kind is aSVM
good
thing.
(Called an
5. Empirically it works very very
Specifying a line and margin
Plus-
PlaneClassifier
MinuBso-
Pulanndeary
• How do we represent this mathematically?

• …in m input dimensions?

Computing the margin width
M = Margin Width
How do we
compute M in
terms of w
and b?
• Plus-plane = { x : w . x + b = +1 }
• Minus-plane { x : w . x + b = -1
Cla=im: The vector w
} is perpendicular to the Plus Plane.
Why?

Computing the margin width
M = Margin Width
How do we
compute M in
terms of w
and b?
• Plus-plane = { x : w . x + b = +1 }
• Minus-plane { x : w . x + b = -1
Cla=im: The vector w
} is perpendicular to the Plus Plane.
Why? Let u and v be two vectors on
the Plus Plane. What is w .
–(uv) ?
And so of course the vector w is
also perpendicular to the
Minus Plane Support Vector Machines: Slide 14
Classifier
M = Margin Width
x+ =
=
s -x
-
Given a guess of w and b we can

• Compute whether all data points in the correct half-planes
• Compute the width of the margin

So now we just need to write a program to search the space
of w’s and b’s to find the widest margin that matches all
the datapoints. How?
Gradient descent? Simulated Annealing? Matrix Inversion?
EM? Newton’s Method?

Non-linear
Feature
SVMs: spaces
General idea: the original feature space
can always be mapped to some
higher- dimensional feature space
training set is separable:
where the
Φ: x → φ(x)

The “Kernel Trick”
The linear classifier relies on inner product between vectors
K(xi,xj)=xiTxj
If every datapoint is mapped into high-dimensional space via some
transformation Φ: x → φ(x), the inner product becomes:
K(xi,xj)= φ(xi) Tφ(xj)
A kernel function is a function that is eqiuvalent to an inner product in

some feature space.
Example:
2-dimensional vectors x=[x1 x2]; let K(xi,xj)=(1 + xiTxj)2,
Need to show that K(xi,xj)= φ(xi) Tφ(xj):
= [1i,xxj)=(1
K(x i12 √2+
xi1xxiTi2xj)x2i2,=
2 √2xi1xi1√
1+ xxj1x
22
2 2x
i2 i+j2T=
2] 2[1xi1xxj1j12 √ √22xxj1i1√
xi22xj2+ xi22xj22 + xj2]
xj12+
=
= φ(xi) Tφ(xj), where φ(x) = [1 x12 √x2j1xxj21x2 x 2 √2x1
2
√2x2]
Thus, a kernel function implicitly maps data to a high-dimensional space
(without the need to compute each φ(x) explicitly).
What Functions are
For some
Kern els?
K(x ,x ) checking that
i j K(xi ,x
φfu(nxci)tioφn(xsj)can be cumberj
T
)=
some. Mercer’s theorem:
Every semi-positive definite symmetric function
is a kernel
Semi-positive definite symmetric functions
cor espond to a semi-positive
definite Gram matrix: symmetric
K(x1,x1) K(x1,x2) K(x1,x3) … K(x1,xn)
K(x2,x1) K(x2,x2) K(x2,x3) K(x2,xn)
K=
… … … … …
K(xn,x1) K(xn,x2) K(xn,x3) … K(xn,xn)

Examples of Kernel
Linear: K(x ,x )=
Function
x → φs
i j
x x
Mapping
iT j (x), where φ(x) is
Φ: itself x
Polynomial of power p: K(xi,xj)= (1+

x
jMi)papxpping
T d Φ: x → φ(x), where p dimensions
φ(x) ha s
2
xi
xj 2
2
Gaussian (radial-basis function): K(xi
,xe)
=Mapping Φ: x → φ(x), where φ(x) isj infinite-dimensional:
every point is mapped to a function (a Gaussian);
combination of functions for support vectors is the separator.
Higher-dimensional space still has intrinsic dimensionality d

(the mapping is not onto), but linear separators in it
correspond to non-linear separators in original space.
SVM applications
SVMs were originally proposed by Boser, Guyon and Vapnik in
1992 and gained increasing popularity in late 1990s.
SVMs are currently among the best performers for a number of
classification tasks ranging from text to genomic data.
SVMs can be applied to complex data types beyond feature
vectors (e.g. graphs, sequences, relational data) by designing
kernel functions for such data.
SVM techniques have been extended to a number of tasks such
as regression [Vapnik et al. ’97], principal component analysis
[Schölkopf et al. ’99], etc.
Most popular optimization algorithms for SVMs use decomposition
to hill-climb over a subset of αi’s at a time, e.g. SMO [Platt ’99]
and [Joachims ’99]
Tuning SVMs remains a black art: selecting a specific kernel
and parameters is usually done in a try-and-see manner.
https://www.learnbay.co/data-science-course/ Support Vector Machines: Slide 19

SVM PDF

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

SVM PDF

Caricato da

Copyright:

Formati disponibili

https://www.learnbay.

How would you

https://www.learnbay.co/data-science-course/ Support Vector Machines: Slide 2

How would you

How would you

Support Vector Machines: Slide 4

How would you

Support Vector Machines: Slide 5

Support Vector Machines: Slide 6

Support Vector Machines: Slide 7

• How do we represent this mathematically?

Support Vector Machines: Slide 11

Support Vector Machines: Slide 12

Given a guess of w and b we can

• Compute the width of the margin

Support Vector Machines: Slide 14

Support Vector Machines: Slide 15

A kernel function is a function that is eqiuvalent to an inner product in

Support Vector Machines: Slide 17

Polynomial of power p: K(xi,xj)= (1+

Higher-dimensional space still has intrinsic dimensionality d

https://www.learnbay.co/data-science-course/ Support Vector Machines: Slide 19

Potrebbero piacerti anche