Sei sulla pagina 1di 13

Discriminant Functions I

Consider a two class problem. We need a function for the decision


boundary to separate the classes.
Consider a simple linear discriminant function:

w2 x2 + w1 x1 + w0 = 0
defined
 over
 a two
  dimensional space, where
w1 x1
w̄ = x̄ =
w2 x2
An example line:
x2 + x1 − 1 = 0
Clearly points about the line yield
x2 + x1 − 1 > 0
while points below the line yield
x2 + x1 − 1 < 0
Pattern Classification Linear Discriminant functions Bayes Decision Theory

Discriminant Functions II
If the points belonging to the two classes C1 and C2 are as shown
in the Figure, they can be easily discriminated.
In general linear discriminant functions are of the form
wd xd + wd−1 xd−1 + ... + w1 x1 + w0 = 0
This represents a hyperplane in a d−dimensional space.
Alternatively can be written as
w̄ t x̄ + w0 = 0

x
2

(0,1)

x1
(1,0)

Pattern Recognition Pattern Classification


Pattern Classification Linear Discriminant functions Bayes Decision Theory

Non linear discriminant functions I

Consider two classes being separated by a circle as shown in the


Figure.
x2 + y2 = r2
x 2 + y 2 − r 2 = 0 is the boundary between the two classes.
Clearly x 2 + y 2 − r 2 < 0 inside the circle
x 2 + y 2 − r > 02 outside the circle
The boundary is clearly nonlinear in the input space.
Consider the transformation:

z1 = x 2
z2 = y2
r2 = 1

This yields

Pattern Recognition Pattern Classification


Pattern Classification Linear Discriminant functions Bayes Decision Theory

Non linear discriminant functions II

z1 + z2 − 1 = 0
This leads to a linear hyperplane in z−space that is isomorphic to
the input x−space.

oo z
2
o y o

o x x x o
x x x o

x x x x z1
o x x o

o
oo
o o

Pattern Recognition Pattern Classification


Pattern Classification Linear Discriminant functions Bayes Decision Theory

p(~x /ωi ) = N (~x /~


µ i , Σi ) (where ~x = [x1 x2 . . . xd ]T )

Classification steps:
I Training Process: We estimate µ̂i and Σ̂i using the dataset of
i th class: D(x~1 x~2 . . . x~N )
I Development Process: We fix our hyperparameters in this
process.
I Testing Process: We test our model using unseen data.

We classify the feature vector ~x to the class for which P(ωi /~x ) is
the highest and the rest becomes the error. Example: If we have M
classes and max is the answer, then error = 1 − P(ωmax /~x )

Pattern Recognition Pattern Classification


Pattern Classification Linear Discriminant functions Bayes Decision Theory

Bayes’ decision Theory

P( ω | x)
i

p(x| ω )P
P (( ω )) p(x| ω2 ) P (ω2 )
1 1

As p(x) does not affect the decision process,


P(ωi |x) ∼ p(x|ωi )P(ωi )
Pattern Recognition Pattern Classification
Pattern Classification Linear Discriminant functions Bayes Decision Theory

Unimodal Multivariate Gaussian Distribution

1 x −µ~i )Σ−1
− 1 (~ x −µ~i )T
p(~x /ωi ) = √ i (~
1 e 2
( 2π)d Σi 2

where,

Σi = E [(~x − µ~i )(~x − µ~i )T ]


µi = E [~
xi ]

ln g1 (~x ) = ln p(~x /ω1 )


1 √ d 1 1
~ )T C1−1 (~x − µ
= − ln( 2π) − ln C1 2 − (~x − µ ~ ) + ln P(w1 )
2 2 2
Similarly for g2 (x) = p(~x /ω2 )

Pattern Recognition Pattern Classification


Pattern Classification Linear Discriminant functions Bayes Decision Theory

Discriminating Function: g (~x ) = ln g1 (~x ) − ln g2 (~x )

CASE-1: C1 = C2 = σ 2 I (less parameters =⇒ less data reqd.)

The quadratic term (x T x) is same in both g1 (~x ) and g2 (~x ), so we


can assume linear equation. Hence,

gi (~x ) = ω~i t ~x + ωio (assumed)

Now, neglecting terms which gets cancelled out in


ln g1 (~x ) − ln g2 (~x ), we get:
−1
g1 (~x ) = (µ~1 µ~1 T − 2µ~1 T ~x ) + ln P(ω1 )
2σ 2
Now comparing our assumed equation and this equation, we get:
µ~i −1
ωi = 2
and ωio = 2 µ~1 µ~1 T + ln P(ωi )
σi 2σ

Pattern Recognition Pattern Classification


Pattern Classification Linear Discriminant functions Bayes Decision Theory

Decision Boundary: g (~x ) = ω~1 T ~x + ω1o − ω~2 T ~x − ω2o = 0


So, ~ T ~x + ωo (because Straight Line)
g (~x ) = ω
where, 1
~ = ω~1 − ω~2 = 2 (µ1 − µ2 )t ~x
ω
σ
−1 P(ω1 )
ωo = ω1o − ω2o = 2 (µ~1 µ~1 T − µ~2 µ~2 T ) + ln
2σ P(ω2 )
Now since,
µ~1 µ~1 T − µ~2 µ~2 T = kµ~1 k2 − kµ~2 k2 = (µ~1 − µ~2 )t (µ~1 + µ~2 )

1 1  P(ω1 )
(µ~1 − µ~2 )t ~x − 2 (µ~1 − µ~2 )t (µ~1 + µ~2 ) + ln

g (~x ) = 2
σ 2σ P(ω2 )
2
 
1 t 1 σ (µ~1 − µ~2 ) P(ω1 )
= 2 (µ~1 − µ~2 ) ~x − (µ~1 + µ~2 ) + ln
σ 2 kµ~1 − µ~2 k2 P(ω2 )
t
~ (~x − xo ) = 0 (i.e The separating plane passes through xo )

Now, if P(ω1 ) = P(ω2 ) then the boundary perpendicularly bisects
the line joining µ~1 and µ~2
Pattern Recognition Pattern Classification
Pattern Classification Linear Discriminant functions Bayes Decision Theory
 2 
σ1 0
CASE-2: C1 = C2 = C (σjk = 0 for j 6= k)
0 σ22
−1
gi (x) = (~x − µ~i )t Ci (~x − µ~i ) + ln P(wi )
2
−1 t −1 1 1 1
= ~x C ~x + µ~ti C −1 ~x + ~x t C −1 µ~i − µ~i t C −1 µ~i + ln P(ωi )
2 2 2 2
Ignoring the terms that do not depend on i i.e they will cancel out.
1
gi (~x ) = (C −1 µi )t ~x − µ~i t C −1 µ~i + ln P(ωi )
2
t
= ω~i ~x + ωio
Now, the discriminating boundary can be given by:
P(ω1 )
g (x) = (C −1 µ~1 − C −1 µ~2 )~x − µ~1 C −1 µ~1 + µ~2 C −1 µ~2 + ln
P(ω2 )
C −1 P(ω1 )
= C −1 (µ~1 − µ~2 )~x − (µ~1 − µ~2 )t (µ~1 + µ~2 ) + ln
2 P(ω2 )

Pattern Recognition Pattern Classification


Pattern Classification Linear Discriminant functions Bayes Decision Theory

~ t (~x − xo )
On comparing with equation of plane g (x) = ω

~ = C −1 (µ~1 − µ~2 )
ω
1 ln P(ω1 )/P(ω2 )
xo = (µ~1 + µ~2 ) −
2 (µ~1 − µ~2 )t C −1

Notice that C −1 will apply an affine rotation on (µ~1 − µ~2 ) so ω ~ will


not be in direction of (µ~1 − µ~2 ) . Also, if priors are equal,
xo = 21 (µ~1 + µ~2 ) i.e it still passes through the midpoint but the
direction is transformed.

Note that the contours have same probability density in a Gaussian


because it is symmetric about the mean.

Pattern Recognition Pattern Classification


Pattern Classification Linear Discriminant functions Bayes Decision Theory

CASE-3: C1 6= C2 (They can be diagonal)


Again, neglecting the terms that don’t affect.
1 1
gi (~x ) = ~x t Ci−1 ~x + µ~i t Ci−1 ~x + µ~i t Ci−1 µ~i + ln P(ωi ) − ln|Ci |
2 2

g (~x ) = ~x t W ~x + ω ~ t ~x + ωo = 0 where,
−1 −1
W = (C1 − C2−1 )
2
ω = (C1−1 µ~1 − C2−1 µ~2 )
−1 1 C1 P(ω1 )
ωo = (µ~2 t C2−1 µ~2 + µ~1 t C1−1 µ~1 ) − ln| | + ln
2 2 C2 P(ω2 )

Pattern Recognition Pattern Classification


Pattern Classification Linear Discriminant functions Bayes Decision Theory

Summary of footprint of density function

EigenVectors
Covariance 2D 3D nD
parallel to axis?
C = σ2I Circle Sphere Hypersphere Yes
C = Diagonal Ellipse Ellipsoid Hyperellipsoid Yes
C = Full Ellipse Ellipsoid Hyperellipsoid No

An extended quadratic discriminant function in two dimensions is


Ax 2 + By 2 + Cx + Dy + E = 0
A = B =⇒ circle
AorBiszero =⇒ Parabola
A.B > 0 =⇒ Ellipse
A.B < 0 =⇒ Hyperbola
Pattern Recognition
A = 0 =⇒ Hyperplane
Pattern Classification

Potrebbero piacerti anche