Sei sulla pagina 1di 12

Outline

Distance Based Classification Algorithms

Distance Based Classification Algorithms

Dr. Bidyut Kr. Patra Classification Technique (Contd..)


Outline
Distance Based Classification Algorithms

Nearest Neighbor Classifier(Cover and Hart, 1967)

Let T = {(xi , yi )}ni=1 be set of labeled patterns (Training set),


where xi is a pattern and yi be its class label. Let x be a
pattern with unknown class label (test pattern).
NN Rule is :
Let x ′ ∈ T be the pattern nearest to a test pattern x.
l (x) = l (x ′ )
Complexity:
Time: O(|T |)
Space: O(|T |)

Dr. Bidyut Kr. Patra Classification Technique (Contd..)


Outline
Distance Based Classification Algorithms

Condensed NNC (Hart, 1968)

INPUT: Training Set T


OUTPUT: A condenced Set S.
1 Start with a condensed set S = {x}.

2 For each x ∈ T \ S
1 Classify x using NN considering S as training set.

2 if x is misclassified then S = S ∪ {x}


3 Repeat Step 2 until no change found in Condensed Set

Dr. Bidyut Kr. Patra Classification Technique (Contd..)


Outline
Distance Based Classification Algorithms

Modified CNN(Devi and Murthy, 2003)


1 Start with condensed set S. S contains one pattern from each
class.

2 G=∅
3 For each x ∈ T
1 Classify x using NN considering S as training set.

2 if x is misclassified then G = G ∪ {x}


4 Find a representative pattern from each class in G; Let
representative set is R.

5 S =S ∪R
6 G=∅
7 Repeat Step 2 to Step 6 until there is no misclassification.
Dr. Bidyut Kr. Patra Classification Technique (Contd..)
Outline
Distance Based Classification Algorithms

MCNN is an order independent algorithm

Many Iterations.

Dr. Bidyut Kr. Patra Classification Technique (Contd..)


Outline
Distance Based Classification Algorithms

K-Nearest Neighbor Classifier

Let T = {(xi , yi )}ni=1 be a Training set.


Let x be a pattern with unknown class label (test pattern).

Algorithm:
KNN = ∅
For each t ∈ T
1 if |KNN| <= K
KNN = KNN ∪ {t}
2 else
1 Find a x ′ ∈ KNN such that dis(x, x ′ ) > dis(x, t)
2 KNN = KNN − {x ′ }; KNN = KNN ∪ {t}
The pattern x belongs to a Class in which most of the
patterns in KNN belong to.

Dr. Bidyut Kr. Patra Classification Technique (Contd..)


Outline
Distance Based Classification Algorithms

How to find the value of K

r -fold Cross Validation


Partition the training set into r blocks. Let these are
T1 , T2 , . . . , Tr
For i = 1 to r do
1 Consider T − Ti as the training set and Ti as the validation set.
2 For a range of K values (say from 1 to m) find the error rates
on the validation set.
3 Let these error rates are ei 1 , ei 2 , . . . , eim
Take ei = mean of {e1i , e2i , . . . , eri }, for i = 1 to m.
The value of K = argmin{e1 , e2 , . . . , em }
i

Dr. Bidyut Kr. Patra Classification Technique (Contd..)


Outline
Distance Based Classification Algorithms

Weighted k-NNC(Dudani, 1976)


k-NNC gives equal importance to the first NN and to the last
NN.
Weight is assigned to each nearest neighbor of a quary
pattern.
Cj C
Let X = {xi=1..k | xi j ∈ T } be the set of k-NN of q, whose
class label is to be determined.
Let D = {d1 , d2 , . . . , dk } be an ordered set, where
di = ||xi − q||, di ≤ dj , i < j
The weight wj is assigned to j th nearest neighbor as follows.
( d −d
k j
if dj 6= d1
wj = dk −d1
1 if dj = d1
Calculate weighted sums of patterns belong to each class.
Classify q to a class for which weighted sum is maximum.
Dr. Bidyut Kr. Patra Classification Technique (Contd..)
Outline
Distance Based Classification Algorithms

Editing Techniques

Larger the training set, more the computational cost.


Another technique eliminates (edit) training prototypes
(pattern) erroneously labeled, commonly outliers, and at the
same time, to “clean” the possible overlapping between
regions of different classes.

Dr. Bidyut Kr. Patra Classification Technique (Contd..)


Outline
Distance Based Classification Algorithms

Editing Techniques

Larger the training set, more the computational cost.


Another technique eliminates (edit) training prototypes
(pattern) erroneously labeled, commonly outliers, and at the
same time, to “clean” the possible overlapping between
regions of different classes.
Wilsons editing relies on the idea that, if a prototype is
erroneously classified using the k-NN, it has to be eliminated
from the training set.

Dr. Bidyut Kr. Patra Classification Technique (Contd..)


Outline
Distance Based Classification Algorithms

Edited Nearest Neighbor (Willson, 1976)

INPUT: T is a training set


OUTPUT: S is an edited set
1 for each x ∈ T do
1 classify x using k-NN Classifier (break the ties randomly)
2 if x is misclassified then mark x
2 Delete all marked paterns from T ; let the reduced training set
be S.
3 Output S

Dr. Bidyut Kr. Patra Classification Technique (Contd..)


Outline
Distance Based Classification Algorithms

Repeated ENN (Tomek, 1976)

Apply ENN method repeatedly until there is no chnage in the


training set T

Dr. Bidyut Kr. Patra Classification Technique (Contd..)

Potrebbero piacerti anche