Sei sulla pagina 1di 21

Introduction

The Basic K -NN

Extensions

Prototype Selection

Summary

5. K -N EAREST N EIGHBOR
Pedro Larranaga
Intelligent Systems Group Department of Computer Science and Articial Intelligence University of the Basque Country

Madrid, 25th of July, 2006

Introduction

The Basic K -NN

Extensions

Prototype Selection

Summary

Outline

Introduction

The Basic K -NN

Extensions of the Basic K -NN

Prototype Selection

Summary

Introduction

The Basic K -NN

Extensions

Prototype Selection

Summary

Outline

Introduction

The Basic K -NN

Extensions of the Basic K -NN

Prototype Selection

Summary

Introduction

The Basic K -NN

Extensions

Prototype Selection

Summary

Basic Ideas

K -NN IBL, CBR, lazy learning A new instance is classied as the most frequent class of its K nearest neighbors Very simple and intuitive idea Easy to implement There is not an explicit model (transduction) K -NN instance based learning (IBL), case based reasoning (CBR), lazy learning

Introduction

The Basic K -NN

Extensions

Prototype Selection

Summary

Outline

Introduction

The Basic K -NN

Extensions of the Basic K -NN

Prototype Selection

Summary

Introduction

The Basic K -NN

Extensions

Prototype Selection

Summary

Algorithm for the basic K -NN


BEGIN Input: D = {(x1 , c1 ), . . . , (xN , cN )} x = (x1 , . . . , xn ) new instance to be classified FOR each labelled instance (xi , ci ) calculate d (xi , x) Order d (xi , x) from lowest to highest, (i = 1, . . . , N )
K Select the K nearest instances to x: Dx K Assign to x the most frequent class in Dx

END

Figure: Pseudo-code for the basic K -NN classier

Introduction

The Basic K -NN

Extensions

Prototype Selection

Summary

Algorithm for the basic K -NN


Example

Figure: Example for the basic 3-NN. m denotes the number of classes, n the number of predictor variables, and N the number of labelled cases

Introduction

The Basic K -NN

Extensions

Prototype Selection

Summary

Algorithm for the basic K -NN


The accuracy is not monotonic with respect to K

Figure: Accuracy versus number of neighbors

Introduction

The Basic K -NN

Extensions

Prototype Selection

Summary

Outline

Introduction

The Basic K -NN

Extensions of the Basic K -NN

Prototype Selection

Summary

Introduction

The Basic K -NN

Extensions

Prototype Selection

Summary

K -NN with rejection

Requiring for some guarantees Demanding for some guarantees before an instance is classied In case that the guarantees are not veried the instance remains unclassied Usual guaranty: threshold for the most frequent class in the neighbor

Introduction

The Basic K -NN

Extensions

Prototype Selection

Summary

K -NN with average distance

Figure: K -NN with average distance

Introduction

The Basic K -NN

Extensions

Prototype Selection

Summary

K -NN with weighted neighbors

Figure: K -NN with weighted neighbors

Introduction

The Basic K -NN

Extensions

Prototype Selection

Summary

K -NN with weighted neighbors

x1 x2 x3 x4 x5 x6

d (xi , x) 2 2 2 2 0.7 0.8

wi 0.5 0.5 0.5 0.5 1/0.7 1/0.8

Figure: Weight to be assigned to each of the 6 selected instances

Introduction

The Basic K -NN

Extensions

Prototype Selection

Summary

K -NN with weighted variables


X1 0 0 0 1 1 1 0 0 0 1 1 1 X2 0 0 0 0 0 1 1 1 1 1 1 0 C 1 1 1 1 1 1 0 0 0 0 0 0

Figure: Variable X1 is not relevant for C

Introduction

The Basic K -NN

Extensions

Prototype Selection

Summary

K -NN with weighted variables


d (xl , x) =

X
n i =1

wi di (xl ,i , xi ) with wi = MI (Xi , C ) p(X ,C ) (0, 0) 1 p(X ,C ) (0, 1) 1 pX (0) pC (1)


1

MI (X1 , C ) = p(X ,C ) (0, 0) log 1

pX (0) pC (0)
1

+ p(X ,C ) (0, 1) log 1

p(X ,C ) (1, 0) log 1 3 12


3 12 6 12

p(X ,C ) (1, 0) 1 pX (1) pC (0)


1

+ p(X ,C ) (1, 1) log 1 3 12


3 12 6 12

p(X ,C ) (1, 1) 1 pX (1) pC (1)


1

log

6 12

3 12

log

3 12 6 12

6 12

log

6 12

3 12

log

3 12 6 12 6 12

=0

MI (X2 , C ) = p(X ,C ) (0, 0) log 2

p(X ,C ) (0, 0) 2 pX (0) pC (0)


2

+ p(X ,C ) (0, 1) log 2

p(X ,C ) (0, 1) 2 pX (0) pC (1)


2

p(X ,C ) (1, 0) log 2 1 12


1 12 6 12 6 12

p(X ,C ) (1, 0) 2 pX (1) pC (0)


2

+ p(X ,C ) (1, 1) log 2 5 12


5 12 6 12 6 12

p(X ,C ) (1, 1) 2 pX (1) pC (1)


2

log

5 12

log

5 12 6 12 6 12

log

1 12

log

1 12 6 12 6 12

Introduction

The Basic K -NN

Extensions

Prototype Selection

Summary

Outline

Introduction

The Basic K -NN

Extensions of the Basic K -NN

Prototype Selection

Summary

Introduction

The Basic K -NN

Extensions

Prototype Selection

Summary

Wilson edition

Eliminating rare instances The class of each labelled instance, (xl , c (l ) ), is compared with the label assigned by a K -NN obtained with all instances except itself If both labels coincide the instance in maintained in the le. Otherwise it is eliminated

Introduction

The Basic K -NN

Extensions

Prototype Selection

Summary

Hart condensation

Maintaining rare instances For each labelled instance, and following the storage ordering, consider a K -NN with only the previous instances to the one to be considered If the true class and the class predicted by the K -NN are the same the instance is not selected Otherwise (the true class and the predicted one are different) the instance is selected The method depends on the storage ordering

Introduction

The Basic K -NN

Extensions

Prototype Selection

Summary

Outline

Introduction

The Basic K -NN

Extensions of the Basic K -NN

Prototype Selection

Summary

Introduction

The Basic K -NN

Extensions

Prototype Selection

Summary

K -nearest neighbor

Intuitive and easy to understand There is not an explicit model: transduction instead of induction Variants of the basic algorithm Storage problems: prototype selection

Introduction

The Basic K -NN

Extensions

Prototype Selection

Summary

5. K -N EAREST N EIGHBOR
Pedro Larranaga
Intelligent Systems Group Department of Computer Science and Articial Intelligence University of the Basque Country

Madrid, 25th of July, 2006

Potrebbero piacerti anche