Sei sulla pagina 1di 37

Pattern Recognition Linear Classifiers

Zaheer Ahmad PhD Scholar ahmad.zaheer@yahoo.com Department of Computer Science University of Peshawar

Agenda
Pattern Recognition
Features and Patterns Classifiers Approaches Design Cycle

Linear Classification
Linear Discriminant Functions Linear Separability Fisher Discriminant Functions Support Vector Machines(SVMs)

What is pattern recognition?


The assignment of a physical object or event to one of several pre-specified categories Duda and Hart The science that concerns the description or classification (recognition) of measurements Schalkoff

The process of giving names to observations x, Schrmann Pattern Recognition is concerned with answering the question What is this? Morse

Applications of PR
Image processing Computer vision Speech recognition Data Mining Automated target recognition Optical character recognition Seismic analysis Man and machine diagnostics Fingerprint identification Industrial inspection Financial forecast Medical diagnosis ECG signal analysis

Terminology
Recognition: During recognition (or classification) given objects are assigned to prescribed classes. classification is the problem of identifying which of a set of categories (sub-populations) a new observation belongs, on the basis of a training set of data containing observations (or instances) whose category membership is known An algorithm that implements classification, especially in a concrete implementation, is known as a classifier A classifier is a machine which performs classification.

Features
Feature is any distinctive aspect, quality or characteristic of an object Features may be symbolic (i.e., color) or numeric (i.e., height) The combination of features is a -dim column vector called a feature vector The -dimensional space defined by the feature vector is called the feature space
Objects are represented as points in feature space; the result is a scatter plot

Features

a good feature vector?


The quality of a feature vector is related to its ability to discriminate It should include examples from different classes Examples from the same class should have similar feature values Examples from different classes have different feature values

More feature properties

Pattern and Pattern Class


A pattern is an object, process or event that can be given a name. Pattern is a composite of traits or features characteristic of an individual In classification tasks, a pattern is a pair of variables {, } where
is a collection of observations or features (feature vector) is the concept behind the observation (label/category)

A pattern class (or category) is a set of patterns sharing common attributes and usually originating from the same source. A class/ pattern class is a set of objects having some important properties in common

Decision Boundary/Surface
A line or curve separating the classes is a decision boundary The equation g(x) = 0 defines the decision surface that separates points assigned to the category 1 from points assigned to the category 2 When g(x) is linear, the decision surface is a hyperplane If x1 and x2 are both on the hyperplane then

Decision Boundary

Slope intercept form of a Line(Straight Line): The equation of a line with a defined slope m can also be written as follows: y = mx + b

The task of a classifier is to partition feature space into class-labeled decision regions Borders between decision regions are called decision boundaries The classification of feature vector consists of determining which decision region it belongs to, and assign to this class

Classifiers

Pattern recognition approaches


Statistical Patterns classified based on an underlying statistical model of the features vector given class ) Neural Classification is based on the response of a network of processing units (neurons) to an input stimuli (pattern)
Knowledge is stored in the connectivity and strength of the synaptic weights Trainable, non-algorithmic, black-box strategy The statistical model is defined by a family of class-conditional probability density functions (|) (Probability of feature

Very attractive since it requires minimum a priori knowledge with enough layers and neurons, ANNs can create any complex decision region Syntactic Patterns classified based on measures of structural similarity Knowledge is represented by means of formal grammars or relational descriptions (graphs) Used not only for classification, but also for description Typically, syntactic approaches formulate hierarchical descriptions of complex patterns built up from simpler sub patterns

The pattern recognition design cycle


Data collection Probably the most time-intensive component of a PR project How many examples are enough? Feature choice Critical to the success of the PR problem
Garbage in, garbage out

Requires basic prior knowledge Model choice Statistical, neural and structural approaches Parameter settings

Training Given a feature set and a blank model, adapt the model to explain the data Supervised, unsupervised and reinforcement learning Evaluation How well does the trained model do? Overfitting vs. generalization

Linear Classification
Classification in which the decision boundary in the feature (input) space is linear In linear classification the input space is split in (hyper)planes, each with an assigned class

Linear Separable
If a hyperplanar decision boundary exists that correctly classify all the training samples for a c=2 class problem, the samples are said to be linearly separable.

Linear Discriminant Function


A discriminant function that is a linear combination of the components of x is called a linear discriminant function and can be written as

where w is the weight vector and w0 is the bias (or threshold weight).

Linear Classifiers
Linear Classifiers
a linear classifier is a mapping which partitions feature space using a linear function (a straight line, or a hyperplane) it is one of the simplest classifiers we can imagine
separate the two classes using a straight line in feature space

in 2 dimensions the decision boundary is a straight line

2-Class Data with a Linear Decision Boundary


TWO-CLASS DATA IN A TWO-DIMENSIONAL FEATURE SPACE 8 Decision Region 1 Decision Region 2

Feature 2

-2 Decision Boundary -4 -4 -2 0 2 4 6 Feature 1 8 10 12 14

Data that is Not Linearly Separable


TWO-CLASS DATA IN A TWO-DIMENSIONAL FEATURE SPACE 6 Decision Region 1 Decision Region 2

Feature 2

0 Decision Boundary -1 2 3 4 5 6 Feature 1 7 8 9 10

Fishers linear discriminant


A simple linear discriminant function is a projection of the data down to 1-D. So choose the projection that gives the best separation of the classes. An obvious direction to choose is the direction of the line joining the class means. But if the main direction of variance in each class is not orthogonal to this line, this will not give good separation (see the next figure). Fishers method chooses the direction that maximizes the ratio of between class variance to within class variance. This is the direction in which the projected points contain the most information about class membership (under Gaussian assumptions)

Classes well-separated in D-space may strongly overlap in 1dimension Adjust component of the weight vector w Select projection to maximize class-separation

Can be generalized for multiple classes

A picture showing the advantage of Fishers linear discriminant.

When projected onto the line joining the class means, the classes are not well separated.

Fisher chooses a direction that makes the projected classes much tighter, even though their projected means are less far apart.

Math of Fishers linear discriminants


What linear transformation is best for discrimination? The projection onto the vector separating the class means seems sensible: But we also want small variance within each class:

y wT x
w m 2 m1
2 s1

n C1

( yn m1 ) ( yn m2 )
between within

2 s2

n C2

Fishers objective function is:

J (w )

(m2 m1 ) 2
2 2 s1 s2

More math of Fishers linear discriminants


J (w ) (m2 m1 ) 2
2 s1 2 s2

wT S B w w SW w
T

S B (m 2 m1 ) (m 2 m1 )T SW (x n m1 ) (x n m1 )T (x n m 2 ) (x n m 2 )T

nC1

nC2

1 Optimal solution : w SW (m 2 m1 )

Support Vector Machines(SVMs)


A support vector machine (SVM) is a concept in statistics and computer science for a set of related supervised learning methods that analyze data and recognize patterns, used for classification and regression analysis. a support vector machine constructs a hyperplane or set of hyperplanes in a high- or infinite-dimensional space, which can be used for classification, regression, or other tasks. a good separation is achieved by the hyperplane that has the largest distance to the nearest training data point of any class the larger the margin the lower the generalization error of the classifier.

Separating Hyperplane
x2

yi 1

yi 1

A separating hypreplane w x b 0
x1

But There are many possibilities


for such hyperplanes !!

Separating Hyperplanes

yi 1

yi 1

Which one should we choose!

Yes, There are many possible separating hyperplanes


It could be this one or this or this or maybe.!

Choosing a separating hyperplane:

-Hyperplane should be as far as possible from any sample point.


-This way a new data that is close to the old samples will be classified correctly.

Good generalization!

xi

x'

Choosing a separating hyperplane. The SVM approach: Linear separable case

-The SVM idea is to maximize the distance between The hyperplane and the closest sample point.
In the optimal hyperplane:

The distance to the closest negative point =


The distance to the closest positive point.

Choosing a separating hyperplane. The SVM approach: Linear separable case

Support vectors are the samples closest to the separating hyperplane.

These are Support Vectors

xi

Potrebbero piacerti anche