Non Parametric Classification: Pattern Recognition

Pattern Recognition
Non Parametric Classification

Dr Khurram Khurshid
Non-parametric classification
The assumption that the probability density function
(pdf) that generated the data has some parametric
form (e.g. Gaussian, etc.) may not hold true in many
cases
Non-parametric classifiers assume that the pdf does
not have any parametric form
It is based on finding the similarity between the data
samples
Non parametric classification

Given a training dataset with classes i.e. data and their
labels are known
x1 , y1 , x2 , y 2 , x3 , y 3 ,L
L , xn , y n , x
Objective: Decide the class (or label) of a test vector xt

Method:
Compute the distance between the vector xt and all examples of
the training dataset
Decision: assign the class of the nearest neighbor (closest
point) from training dataset to the vector xt
Non parametric classification

Feature Space
xt
Vectors of the red

class
Vectors of the blue

class
Class (xtClass
) = of the nearest neighbor
Problem Statement
Can we LEARN to recognise a rugby player?
What are the features of a rugby player?
Features
Rugby players = short + heavy?
190cm
130cm
60kg
90kg
Features
Ballet dancers = tall + skinny?
190cm
130cm
60kg
90kg
Feature Space
Rugby players cluster separately in the space.
Height
Weight
The K-Nearest Neighbour Algorithm
Whos this?
Height
Weight

1. Measure distance to all points
Height
Weight

2. Find closest k points
(here k=3, but it could be more)
Height
Weight

2. Find closest k points
3. Assign majority class
(here k=3, but it could be more)
Height
Weight
Distance Measure
Euclidean distance
d ( w w1 ) (h h1 )
2
(w, h)
Height
Weight
(w1, h1)

for each testing point
measure distance to every training point
find the k closest points
identify the most common class among those k
predict that class
end
Advantage: Surprisingly good classifier!
Disadvantage: Have to store the entire training set in
memory
The Decision Boundary

Imagine testing loads of points.
Whos this?
Whos this?
Whos this?

These hypothetical points
are closest to the ballet dancers
These hypothetical points

are closest to the rugby players

Wheres the decision boundary?

Not always a nice neat line!


Not always contiguous!
Distance Measure
Euclidean distance still works in 3-d, 4-d, 5-d, etc.
d ( x x1 ) ( y y1 ) ( z z1 )
2
x = Height
y = Weight
z = Shoe size

Called a decision surface in 3-d and upward
Over-fitting
An Important Concept in Pattern Recognition
Over-fitting
Looks good so far
Over-fitting
Looks good so far
Oh no! Mistakes!
What happened?
Over-fitting
Looks good so far
Oh no! Mistakes!
What happened?
We didnt have all the data.

We can never assume that we do.
This is called OVER-FITTING
to the small dataset.
Over-fitting
While an overly complex model may allow perfect
classification of the training samples, it is unlikely to give
good classification of novel patterns
Features
Choosing the wrong features makes it difficult
Too many features
Its computationally intensive
Possible features:
- Shoe size
- Height
- Age
- Weight
Shoe size
Age
PR Problem
Now how is this problem like
handwriting
recognition?
height
weight
Handwriting Recognition
Lets say the axes now represent pixel values.
A two-pixel image
255
pixel 1 value
(190, 85)
255
pixel 2 value
A three-pixel image
(190, 85, 202)

pixel 1
pixel 2
pixel 3
This 3-pixel image is represented by

a SINGLE point in a 3-D space.
Distances between images
(190, 85, 202)
A three-pixel image
Another 3-pixel image
pixel 1
(25, 150, 75)
pixel 2
pixel 3
Straight line distance

between them
A four-pixel image.
A different four-pixel image.
(190, 85, 202, 10)

Same 4-dimensional vector!
Assuming we read pixels in a systematic manner, we can now

represent any image as a single point in a high dimensional space.
16 x 16 image. How many dimensions?
Distances between digits.
?
Straight-line distance in N-dimensional space?
Recognising Handwritten Digits

maybe
maybe
probably
not
Which is closest neighbour in N-dimensions?
K-Mean Clustering
1.
2.
Partition the data points into K clusters randomly. Find the

centroids of each cluster.
For each data point:
3.
4.
Calculate the distance from the data point to each cluster.

Assign the data point to the closest cluster.
Recompute the centroid of each cluster.

Repeat steps 2 and 3 until there is no further change in the
assignment of data points (or in the centroids).
K-Mean Clustering
K-Mean Clustering
K-Mean Clustering
K-Mean Clustering
K-Mean Clustering
K-Mean Clustering
K-Mean Clustering
Assignment
Write a code to identify the three clusters in the
given image.
The image will be emailed to you.
You have to submit the code, and resulting image
by next week.
K-Nearest Neighbor Rule

Given a training dataset with classes i.e. data and their
labels are known
x1 , y1 , x 2 , y 2 , x3 , y 3 ,L L , x n , y n , x d
Objective: Decide the class (or label) of a test vector xt
Method:
Compute the distances between the vector xt and all examples
of the training dataset
Determine k nearest neighbors of xt
Decision: class(xt) = majority vote of k nearest neighbors
The choice of distance is very important

Example
Class 1
Class 1
Class 2
Class 2
1-NN
3-NN
The test sample (green circle) should be classified either to the first class of
blue squares or to the second class of red triangles. If k = 3 it is assigned to
the second class because there are 2 triangles and only 1 square inside the
inner circle. If k = 5 it is assigned to the first class (3 squares vs. 2 triangles
inside the outer circle).
Image Source: Wikipedia
Distance Metric
A distance metric D(.,.) is merely a function that gives
generalized distance between two patterns
For three vectors a, b and c, a distance metric should hold
following properties
Non-negativity:
Reflexivity:
Symmetry:
D (a,b) 0
D(a,b) 0if and only if a = b

D (a,b) D(b,a)
Triangle inequality:
D(a,b) D(b,c) D(a,c)
Distance Metric
Euclidean distance possesses all the four properties
ak bk
k 1
D(a,b) a b
Euclidean distance
in d-dimensions
One general class of metrics for d-dimensional

patterns is the Minkowski metric
Lk (a,b)
a b
i 1
L1 distance is called the Manhattan or city block distance

L2 distance is the Euclidean distance
Distance Metric
Distance metrics may not be invariant to linear
transformations (e.g. scaling)
Each coordinate is scaled by some constant

A nearest-neighbor classifier would have different results
depending upon such rescaling. Consider the test point x
and its nearest neighbor. In the original space (left), the
black prototype is closest. In the figure at the right, the x1
axis has been rescaled by a factor 1/3; now the nearest
Distance Metric
Data normalization
If there is a large disparity in the ranges of the full
data in each dimension, a common procedure is to
rescale all the data to equalize such ranges
Euclidean distance is frequently employed as

the distance metric in nearest neighbor classifier
k-Nearest Neighbor Classifier

Example:
k =15
Decision
boundary is
relatively simple
Decision Regions
Voronoi Diagram
Given a set of points (referred to as sites or nodes) a
Voronoi diagram is a partition of space into regions,
within which all points are closer to some particular
node than to any other node
Voronoi Editing
Each cell contains one sample, and every location within
the cell is closer to that sample than to any other
sample.
Every query point will be assigned the classification of
the sample within that cell.
Voronoi Editing
Knowledge of this boundary is sufficient to classify new
points.
The boundary itself is rarely computed; many algorithms
seek to retain only those points necessary to generate
an identical boundary.
Voronoi Editing
The boundaries of the Voronoi regions separating
those regions whose nodes are of different class
contribute to a portion of the decision boundary
The decision boundary
Voronoi Editing
The nodes of the Voronoi regions whose boundaries
did not contribute to the decision boundary are
redundant and can be safely deleted from the training
set
The edited training set.
Voronoi Editing
A drawback of Voronoi editing is that it still leaves too many

points in the edited set by preserving many points which are well
separated from other classes and which should intuitively be
deleted
Points A1 and B1 are well separated. They are preserved by the Voronoi editing to
maintain a portion of the decision boundary. However, assuming that new points will
come from the same distribution as the training set, the portions of the decision boundary
remote from the concentration of training set points is of lesser importance.
Gabriel Graph
Two points A and B are said to be Gabriel neighbours if
their diametral sphere (i.e. the sphere such that AB is
its diameter) doesn't contain any other points
Points A and B are

Gabriel neighbours,
whereas B and C are
not
A graph where all pairs of Gabriel neighbours are

connected with an edge is called the Gabriel graph.
Gabriel Editing - algorithm

Compute the Gabriel graph for the training set.
Visit each node, marking it if all its Gabriel
neighbours are of the same class as the current
node.
Delete all marked nodes, exiting with the remaining
ones as the edited training set.
Wilson Editing
Remove points that do not agree with the
majority of their k nearest neighbours
Overlapping classes
Original data
Wilson editing with k=7
Original data
Wilson editing with k=7
Condensed nearest neighbor (cnn)

Aim is to reduce the number of training samples
Retain only the samples that are needed to define the
decision boundary
Decision Boundary Consistent a subset whose nearest
neighbour decision boundary is identical to the boundary
of the entire training set
Consistent Set a subset of the training data that
correctly classifies all of the original training data
Minimum Consistent Set smallest consistent set
Original data
Consistent set
Minimum Consistent
Set
This algorithm produces neither minimal nor decision

boundary consistent set
Algorithm (Hart 1968)
Initialize subset with a single training example
Classify all remaining samples using the subset, and transfer
any incorrectly classified samples to the subset
Return to 2 until no transfers occurred or the subset is full






References
Chapter # 4, Pattern Classification by Richard O.
Duda, Peter E. Hart & David G. Stork
Some part from chapter 10 of the book

Non Parametric Classification: Pattern Recognition

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Non Parametric Classification: Pattern Recognition

Caricato da

Copyright:

Formati disponibili

Pattern Recognition

Non Parametric Classification

Non parametric classification

Objective: Decide the class (or label) of a test vector xt

Non parametric classification

Vectors of the red

Vectors of the blue

What are the features of a rugby player?

The K-Nearest Neighbour Algorithm

The K-Nearest Neighbour Algorithm

The K-Nearest Neighbour Algorithm

(here k=3, but it could be more)

The K-Nearest Neighbour Algorithm

(here k=3, but it could be more)

The K-Nearest Neighbour Algorithm

The Decision Boundary

The Decision Boundary

These hypothetical points

The Decision Boundary

The Decision Boundary

The Decision Boundary

The Decision Boundary

The Decision Boundary

We didnt have all the data.

(190, 85, 202)

This 3-pixel image is represented by

(190, 85, 202)

Another 3-pixel image

(25, 150, 75)

Straight line distance

A different four-pixel image.

(190, 85, 202, 10)

Assuming we read pixels in a systematic manner, we can now

Straight-line distance in N-dimensional space?

Recognising Handwritten Digits

Partition the data points into K clusters randomly. Find the

Calculate the distance from the data point to each cluster.

Recompute the centroid of each cluster.

K-Nearest Neighbor Rule

K-Nearest Neighbor Rule

K-Nearest Neighbor Rule

D(a,b) 0if and only if a = b

D(a,b) D(b,c) D(a,c)

One general class of metrics for d-dimensional

L1 distance is called the Manhattan or city block distance

Each coordinate is scaled by some constant

Euclidean distance is frequently employed as

k-Nearest Neighbor Classifier

The decision boundary

The edited training set.

A drawback of Voronoi editing is that it still leaves too many

Points A and B are

A graph where all pairs of Gabriel neighbours are

Gabriel Editing - algorithm

Wilson editing with k=7

Wilson editing with k=7

Condensed nearest neighbor (cnn)

Condensed nearest neighbor (cnn)

Condensed nearest neighbor (cnn)

This algorithm produces neither minimal nor decision

Condensed nearest neighbor (cnn)

This algorithm produces neither minimal nor decision

Condensed nearest neighbor (cnn)