Sei sulla pagina 1di 22

Lecture Slides for

INTRODUCTION TO

Machine Learning
ETHEM ALPAYDIN
The MIT Press, 2004
alpaydin@boun.edu.tr
http://www.cmpe.boun.edu.tr/~ethem/i2ml

CHAPTER 8:

Nonparametric
Methods

Nonparametric Estimation

Parametric (single global model), semiparametric


(small number of local models)
Nonparametric: Similar inputs have similar outputs
Functions (pdf, discriminant, regression) change
smoothly
Keep the training data;let the data speak for itself
Given x, find a small number of closest training
instances and interpolate from these
Aka lazy/memory-based/case-based/instance-based
learning

Density Estimation

Given the training set X={xt}t drawn iid from p(x)

Divide data into bins of size h


Histogram:
#xt in the same bin as x
p x
Nh
Naive estimator:

or

# x h xt x h
p x
2Nh

1 N x xt
p x
w

Nh t 1 h

1 / 2 if u 1

w u

otherwise
4

Kernel Estimator

Kernel function, e.g., Gaussian kernel:

u2
1
K u
exp
2
2

Kernel estimator (Parzen windows)

1 N x xt
p x
K

Nh t 1 h

k-Nearest Neighbor Estimator

Instead of fixing bin width h and counting the


number of instances, fix the instances (neighbors)
k and check bin width

k
p x
2Ndk x
dk(x), distance to kth closest instance to x

10

Multivariate Data

Kernel density estimator

1
p x
Nh d

x xt
K

t 1
h

Multivariate Gaussian kernel


2

u
1

K u
exp
2

1
1 T 1
K u
exp u S u
1/ 2
d/ 2
2

2 S
d

spheric
ellipsoid

11

Nonparametric Classification

Estimate p(x|Ci) and use Bayes rule

Kernel estimator

1
p x | C i
N ih d

x xt
K

t 1
h

gi x p x | C i P C i
Nh d

ri

P C i N i
N

x xt
K

t 1
h

rit

k-NN estimator

P C i ki

ki
p
x
|
C
i
P C i | x
p x | C i

k
p x
N iV x
k
12

Condensed Nearest Neighbor

Time/space complexity of k-NN is O (N)


Find a subset Z of X that is small and is accurate
in classifying X (Hart, 1968)

E' Z | X E X | Z Z

13

Condensed Nearest Neighbor

Incremental algorithm: Add instance if needed

14

Nonparametric Regression

Aka smoothing models


Regressogram

t
t
b
x
,
x
r
t 1
N

g x

t
b
x
,
x
t 1
N

where
t

1
if
x
is in the same bin with x
t
b x,x
0 otherwise

15

16

17

Running Mean/Kernel
Smoother

Running mean smoother

x xt
t 1w h

g x

x xt
t 1w h

g x

where

r t

x xt
t 1 K h

where K( ) is Gaussian

1 if u 1
0 otherwise

x xt
t 1 K h

w u

Kernel smoother

Additive models (Hastie


and Tibshirani, 1990)

Running line smoother


18

19

20

21

How to Choose k or h?

When k or h is small, single instances matter; bias


is small, variance is large (undersmoothing): High
complexity
As k or h increases, we average over more
instances and variance decreases but bias
increases (oversmoothing): Low complexity
Cross-validation is used to finetune k or h.

22

Potrebbero piacerti anche