Sei sulla pagina 1di 74

Pattern Recognition

Non Parametric Classification


Dr Khurram Khurshid

Non-parametric classification
The assumption that the probability density function
(pdf) that generated the data has some parametric
form (e.g. Gaussian, etc.) may not hold true in many
cases
Non-parametric classifiers assume that the pdf does
not have any parametric form
It is based on finding the similarity between the data
samples

Non parametric classification


Given a training dataset with classes i.e. data and their
labels are known

x1 , y1 , x2 , y 2 , x3 , y 3 ,L

L , xn , y n , x

Objective: Decide the class (or label) of a test vector xt


Method:
Compute the distance between the vector xt and all examples of
the training dataset
Decision: assign the class of the nearest neighbor (closest
point) from training dataset to the vector xt

Non parametric classification


Feature Space

xt

Vectors of the red


class

Vectors of the blue


class

Class (xtClass
) = of the nearest neighbor

Problem Statement
Can we LEARN to recognise a rugby player?

What are the features of a rugby player?

Features
Rugby players = short + heavy?
190cm

130cm
60kg

90kg

Features
Ballet dancers = tall + skinny?
190cm

130cm
60kg

90kg

Feature Space
Rugby players cluster separately in the space.

Height

Weight

The K-Nearest Neighbour Algorithm

Whos this?
Height

Weight

The K-Nearest Neighbour Algorithm


1. Measure distance to all points

Height

Weight

The K-Nearest Neighbour Algorithm


1. Measure distance to all points
2. Find closest k points

(here k=3, but it could be more)

Height

Weight

The K-Nearest Neighbour Algorithm


1. Measure distance to all points
2. Find closest k points
3. Assign majority class

(here k=3, but it could be more)

Height

Weight

Distance Measure

Euclidean distance

d ( w w1 ) (h h1 )
2

(w, h)
Height

Weight

(w1, h1)

The K-Nearest Neighbour Algorithm


for each testing point
measure distance to every training point
find the k closest points
identify the most common class among those k
predict that class

end
Advantage: Surprisingly good classifier!
Disadvantage: Have to store the entire training set in
memory

The Decision Boundary


Imagine testing loads of points.
Whos this?
Whos this?

Whos this?

The Decision Boundary


These hypothetical points
are closest to the ballet dancers

These hypothetical points


are closest to the rugby players

The Decision Boundary


Wheres the decision boundary?

The Decision Boundary


Wheres the decision boundary?
Not always a nice neat line!

The Decision Boundary


Wheres the decision boundary?

The Decision Boundary


Wheres the decision boundary?
Not always contiguous!

Distance Measure
Euclidean distance still works in 3-d, 4-d, 5-d, etc.

d ( x x1 ) ( y y1 ) ( z z1 )
2

x = Height
y = Weight
z = Shoe size

The Decision Boundary


Called a decision surface in 3-d and upward

Over-fitting
An Important Concept in Pattern Recognition

Over-fitting
Looks good so far

Over-fitting
Looks good so far
Oh no! Mistakes!
What happened?

Over-fitting
Looks good so far
Oh no! Mistakes!
What happened?

We didnt have all the data.


We can never assume that we do.
This is called OVER-FITTING
to the small dataset.

Over-fitting
While an overly complex model may allow perfect
classification of the training samples, it is unlikely to give
good classification of novel patterns

Features
Choosing the wrong features makes it difficult
Too many features
Its computationally intensive
Possible features:
- Shoe size
- Height
- Age
- Weight

Shoe size

Age

PR Problem
Now how is this problem like

handwriting
recognition?
height

weight

Handwriting Recognition
Lets say the axes now represent pixel values.

A two-pixel image

255

pixel 1 value

(190, 85)

255
pixel 2 value

Handwriting Recognition
A three-pixel image

(190, 85, 202)


pixel 1

pixel 2

pixel 3

This 3-pixel image is represented by


a SINGLE point in a 3-D space.

Handwriting Recognition
Distances between images

(190, 85, 202)

A three-pixel image

Another 3-pixel image

pixel 1

(25, 150, 75)

pixel 2

pixel 3

Straight line distance


between them

Handwriting Recognition
A four-pixel image.

A different four-pixel image.

(190, 85, 202, 10)


Same 4-dimensional vector!

Assuming we read pixels in a systematic manner, we can now


represent any image as a single point in a high dimensional space.

Handwriting Recognition
16 x 16 image. How many dimensions?

Handwriting Recognition
Distances between digits.
?

Straight-line distance in N-dimensional space?

Recognising Handwritten Digits


maybe

maybe

probably
not
Which is closest neighbour in N-dimensions?

K-Mean Clustering
1.

2.

Partition the data points into K clusters randomly. Find the


centroids of each cluster.
For each data point:

3.
4.

Calculate the distance from the data point to each cluster.


Assign the data point to the closest cluster.

Recompute the centroid of each cluster.


Repeat steps 2 and 3 until there is no further change in the
assignment of data points (or in the centroids).

K-Mean Clustering

K-Mean Clustering

K-Mean Clustering

K-Mean Clustering

K-Mean Clustering

K-Mean Clustering

K-Mean Clustering

Assignment
Write a code to identify the three clusters in the
given image.
The image will be emailed to you.
You have to submit the code, and resulting image
by next week.

K-Nearest Neighbor Rule


Given a training dataset with classes i.e. data and their
labels are known
x1 , y1 , x 2 , y 2 , x3 , y 3 ,L L , x n , y n , x d
Objective: Decide the class (or label) of a test vector xt
Method:
Compute the distances between the vector xt and all examples
of the training dataset
Determine k nearest neighbors of xt
Decision: class(xt) = majority vote of k nearest neighbors
The choice of distance is very important

K-Nearest Neighbor Rule


Example
Class 1
Class 1

Class 2
Class 2

1-NN

3-NN

K-Nearest Neighbor Rule

The test sample (green circle) should be classified either to the first class of
blue squares or to the second class of red triangles. If k = 3 it is assigned to
the second class because there are 2 triangles and only 1 square inside the
inner circle. If k = 5 it is assigned to the first class (3 squares vs. 2 triangles
inside the outer circle).
Image Source: Wikipedia

Distance Metric
A distance metric D(.,.) is merely a function that gives
generalized distance between two patterns
For three vectors a, b and c, a distance metric should hold
following properties
Non-negativity:
Reflexivity:
Symmetry:

D (a,b) 0

D(a,b) 0if and only if a = b


D (a,b) D(b,a)

Triangle inequality:

D(a,b) D(b,c) D(a,c)

Distance Metric
Euclidean distance possesses all the four properties

ak bk

k 1

D(a,b) a b

Euclidean distance
in d-dimensions

One general class of metrics for d-dimensional


patterns is the Minkowski metric

Lk (a,b)

a b
i 1

L1 distance is called the Manhattan or city block distance


L2 distance is the Euclidean distance

Distance Metric
Distance metrics may not be invariant to linear
transformations (e.g. scaling)

Each coordinate is scaled by some constant


A nearest-neighbor classifier would have different results
depending upon such rescaling. Consider the test point x
and its nearest neighbor. In the original space (left), the
black prototype is closest. In the figure at the right, the x1
axis has been rescaled by a factor 1/3; now the nearest

Distance Metric
Data normalization
If there is a large disparity in the ranges of the full
data in each dimension, a common procedure is to
rescale all the data to equalize such ranges

Euclidean distance is frequently employed as


the distance metric in nearest neighbor classifier

k-Nearest Neighbor Classifier


Example:
k =15
Decision
boundary is
relatively simple

Decision Regions
Voronoi Diagram
Given a set of points (referred to as sites or nodes) a
Voronoi diagram is a partition of space into regions,
within which all points are closer to some particular
node than to any other node

Voronoi Editing
Each cell contains one sample, and every location within
the cell is closer to that sample than to any other
sample.
Every query point will be assigned the classification of
the sample within that cell.

Voronoi Editing
Knowledge of this boundary is sufficient to classify new
points.
The boundary itself is rarely computed; many algorithms
seek to retain only those points necessary to generate
an identical boundary.

Voronoi Editing
The boundaries of the Voronoi regions separating
those regions whose nodes are of different class
contribute to a portion of the decision boundary

The decision boundary

Voronoi Editing
The nodes of the Voronoi regions whose boundaries
did not contribute to the decision boundary are
redundant and can be safely deleted from the training
set

The edited training set.

Voronoi Editing

A drawback of Voronoi editing is that it still leaves too many


points in the edited set by preserving many points which are well
separated from other classes and which should intuitively be
deleted

Points A1 and B1 are well separated. They are preserved by the Voronoi editing to
maintain a portion of the decision boundary. However, assuming that new points will
come from the same distribution as the training set, the portions of the decision boundary
remote from the concentration of training set points is of lesser importance.

Gabriel Graph
Two points A and B are said to be Gabriel neighbours if
their diametral sphere (i.e. the sphere such that AB is
its diameter) doesn't contain any other points

Points A and B are


Gabriel neighbours,
whereas B and C are
not

A graph where all pairs of Gabriel neighbours are


connected with an edge is called the Gabriel graph.

Gabriel Editing - algorithm


Compute the Gabriel graph for the training set.
Visit each node, marking it if all its Gabriel
neighbours are of the same class as the current
node.
Delete all marked nodes, exiting with the remaining
ones as the edited training set.

Wilson Editing
Remove points that do not agree with the
majority of their k nearest neighbours
Overlapping classes

Original data

Wilson editing with k=7

Original data

Wilson editing with k=7

Condensed nearest neighbor (cnn)


Aim is to reduce the number of training samples
Retain only the samples that are needed to define the
decision boundary
Decision Boundary Consistent a subset whose nearest
neighbour decision boundary is identical to the boundary
of the entire training set
Consistent Set a subset of the training data that
correctly classifies all of the original training data
Minimum Consistent Set smallest consistent set

Condensed nearest neighbor (cnn)

Original data

Consistent set

Minimum Consistent
Set

Condensed nearest neighbor (cnn)

This algorithm produces neither minimal nor decision


boundary consistent set
Algorithm (Hart 1968)
Initialize subset with a single training example
Classify all remaining samples using the subset, and transfer
any incorrectly classified samples to the subset
Return to 2 until no transfers occurred or the subset is full

Condensed nearest neighbor (cnn)

This algorithm produces neither minimal nor decision


boundary consistent set
Algorithm (Hart 1968)
Initialize subset with a single training example
Classify all remaining samples using the subset, and transfer
any incorrectly classified samples to the subset
Return to 2 until no transfers occurred or the subset is full

Condensed nearest neighbor (cnn)

This algorithm produces neither minimal nor decision


boundary consistent set
Algorithm (Hart 1968)
Initialize subset with a single training example
Classify all remaining samples using the subset, and transfer
any incorrectly classified samples to the subset
Return to 2 until no transfers occurred or the subset is full

Condensed nearest neighbor (cnn)

This algorithm produces neither minimal nor decision


boundary consistent set
Algorithm (Hart 1968)
Initialize subset with a single training example
Classify all remaining samples using the subset, and transfer
any incorrectly classified samples to the subset
Return to 2 until no transfers occurred or the subset is full

Condensed nearest neighbor (cnn)

This algorithm produces neither minimal nor decision


boundary consistent set
Algorithm (Hart 1968)
Initialize subset with a single training example
Classify all remaining samples using the subset, and transfer
any incorrectly classified samples to the subset
Return to 2 until no transfers occurred or the subset is full

Condensed nearest neighbor (cnn)

This algorithm produces neither minimal nor decision


boundary consistent set
Algorithm (Hart 1968)
Initialize subset with a single training example
Classify all remaining samples using the subset, and transfer
any incorrectly classified samples to the subset
Return to 2 until no transfers occurred or the subset is full

Condensed nearest neighbor (cnn)

This algorithm produces neither minimal nor decision


boundary consistent set
Algorithm (Hart 1968)
Initialize subset with a single training example
Classify all remaining samples using the subset, and transfer
any incorrectly classified samples to the subset
Return to 2 until no transfers occurred or the subset is full

References
Chapter # 4, Pattern Classification by Richard O.
Duda, Peter E. Hart & David G. Stork
Some part from chapter 10 of the book

Potrebbero piacerti anche