Artificial Intelligence

ARTIFICIAL INTELLIGENCE
TITLE:
TOOL USED:
TEAM MEMBERS:
SUPPORT VECTOR MACHINES

R PROGRAMMING
RAJNITA.L(2013503063)
SRUTHI.T(2013503027)
INTRODUCTION:
Support Vector Machines are based on the concept of decision planes that define decision
boundaries. A decision plane is one that separates between a set of objects having different class
memberships. A schematic example is shown in the illustration below. In this example, the objects
belong either to class GREEN or RED. The separating line defines a boundary on the right side of
which all objects are GREEN and to the left of which all objects are RED. Any new object (white
circle) falling to the right is labeled, i.e., classified, as GREEN (or classified as RED should it fall to
the left of the separating line).
The above is a classic example of a linear classifier, i.e., a classifier that separates a set of objects
into their respective groups (GREEN and RED in this case) with a line. Most classification tasks,
however, are not that simple, and often more complex structures are needed in order to make an
optimal separation, i.e., correctly classify new objects (test cases) on the basis of the examples
that are available (train cases). This situation is depicted in the illustration below. Compared to the
previous schematic, it is clear that a full separation of the GREEN and RED objects would require a
curve (which is more complex than a line). Classification tasks based on drawing separating lines to
distinguish between objects of different class memberships are known as hyperplane classifiers.
Support Vector Machines are particularly suited to handle such tasks.
The illustration below shows the basic idea behind Support Vector Machines. Here we see the
original objects (left side of the schematic) mapped, i.e., rearranged, using a set of mathematical
functions, known as kernels. The process of rearranging the objects is known as mapping
(transformation). Note that in this new setting, the mapped objects (right side of the schematic) is
linearly separable and, thus, instead of constructing the complex curve (left schematic), all we
have to do is to find an optimal line that can separate the GREEN and the RED objects.
CONCEPTS:
Hyperplane
In geometry a hyperplane is a subspace of one dimension less than its ambient space. If a
space is 3-dimensional then its hyperplanes are the 2-dimensional planes, while if the
space is 2-dimensional, its hyperplanes are the 1-dimensional lines.
Support Vectors
Support vectors are the data points that lie closest to the decision surface (or
hyperplane) .They are the data points most difficult to classify . They have direct
bearing on the optimum location of the decision surface.
Support Vector Machines

SVMs maximize the margin (Winston terminology: the street) around the
separating hyperplane. The decision function is fully specified by a (usually very
small) subset of training samples, the support vectors. This becomes a Quadratic
programming problem that is easy to solve by standard methods.
EXPLANATION
A support vector machine constructs a hyper-plane or set of hyper-planes in a high or infinite
dimensional space, which can be used for classification, regression or other tasks. Intuitively,
a good separation is achieved by the hyper-plane that has the largest distance to the nearest
training data points of any class (so-called functional margin), since in general the larger the
margin the lower the generalization error of the classifier.
Separation by Hyperplanes
Assume linear separability for now:
in 2 dimensions, can separate by a line in higher dimensions, need
hyperplanes
Can find separating hyperplane by linear programming (e.g. perceptron):
separator can be expressed as ax + by = c
Linear Programming / Perceptron

Find a,b,c, such that ax + by c for red points ax + by c for green points
Which Hyperplane?
Lots of possible solutions for a,b,c.
Some methods find a separating hyperplane, but not the optimal one (e.g.,
perceptron)
Most methods find an optimal separating hyperplane
Which points should influence optimality?
All points
Linear regression
Nave Bayes
Only difficult points close to decision boundary
Support vector machines
Logistic regression (kind of)
Support Vectors again for linearly separable case

Support vectors are the elements of the training set that would change the
position of the dividing hyper plane if removed.
Support vectors are the critical elements of the training set
The problem of finding the optimal hyper plane is an optimization problem and
can be solved by optimization techniques (use Lagrange multipliers to get into a
form that can be solved analytically).
SUPPORT VECTORS : INPUT FOR WHICH
HOW IT WORKS?
Define the hyperplane H such that:
xi w+b +1 when yi =+1
xi w+b -1 when yi =-1
H1 and H2 are the planes:
H1: xi w+b = +1
H2: xi w+b = -1
The points on the planes H1 and H2 are the Support Vectors
d+ = the shortest distance to the closest positive point.
d- = the shortest distance to the closest negative point
The margin of a separating hyperplane is d+ + d- .
The algorithm to generate the weights proceeds in such a way that only the
support vectors determine the weights and thus the boundary.
Maximizing the margin

d+ d- We want a classifier with as big margin as possible.The distance from a
point(x0,y0) to a line: Ax+By+c = 0 is |A x0 +B y0 +c|/sqrt(A2+B2).
The distance between H and H1 is: |wx+b|/||w||=1/||w|| .
The distance between H1 and H2 is: 2/||w||.
In order to maximize the margin, we need to minimize ||w||.

With the condition that there are no datapoints between H1 and H2:
xi w+b +1 when yi =+1
xi w+b -1 when yi =-1 Can be combined into yi (xi w) 1
Classification
SVC, NuSVC and LinearSVC are classes capable of performing multi-class classification
on a dataset.
The advantages of support vector machines are:
Effective in high dimensional spaces.
Still effective in cases where number of dimensions is greater than the number of
samples.
Uses a subset of training points in the decision function (called support vectors),
so it is also memory efficient.
Versatile: different Kernel functions can be specified for the decision function.
Common kernels are provided, but it is also possible to specify custom kernels.
The disadvantages of support vector machines include:
If the number of features is much greater than the number of samples, the
method is likely to give poor performances.
SVMs do not directly provide probability estimates, these are calculated using an
expensive five-fold cross-validation (see Scores and probabilities, below).

Artificial Intelligence

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Artificial Intelligence

Caricato da

Copyright:

Formati disponibili

ARTIFICIAL INTELLIGENCE

SUPPORT VECTOR MACHINES

Support Vector Machines

Linear Programming / Perceptron

Lots of possible solutions for a,b,c.

Support Vectors again for linearly separable case

Maximizing the margin

In order to maximize the margin, we need to minimize ||w||.

The advantages of support vector machines are:

Effective in high dimensional spaces.

The disadvantages of support vector machines include:

Potrebbero piacerti anche