Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Outline
Linear Classification
Linear Regression as a Classifier
Linear Discriminant Analysis
Logistic Regression
Cost function
Maximum likelihood estimation
Multi-class logistic regression
Linear Classification
Linear Classifier
Classifier Partitions input space into decision
regions
Linearly Separable Input space can be partitioned
by linear decision boundary
2.5
3.5
2
3
1.5
2.5
Input Variable X3
Input Variable X
0.5
1.5
0.5
0.5
0
4.5
3.5
Linear Classification
2.5
Input Variable X2
1.5
1.5
5.5
4.5
3.5
3
Input Variable X2
2.5
1.5
";<
Linear Classification
Example
1
0.9
0.8
0.7
1
0.6
0.9
0.5
0.8
0.4
0.7
0.3
0.6
0.2
0.5
0.1
0.4
0
1
X
1
0.3
0.9
0.2
0.8
0.1
0.7
0
1
0.6
0.5
0.4
0.3
0.2
0.1
0
1
Linear Classification
1
1
0
0
0
0
Linear Classification
0
0
0
0
0
0
0
0
1
1
1
1
W = O
:
P<
";<
CSL465/603 - Machine Learning
";<
Linear Classification
X1
LinearDegree
Discriminant
= 1; Analysis
Error = 0.33
1
3
3
3 3 33
3
33333 333 33 33
33
3
3 3 3 3333
3
3 33 33 3
33
33
3 3 333 333
33 333 333 3 3
33
33
33
333 3 3333 333333
33
33
333333 3 3 3
3
3
3
111
3
3
3
3
3
3
3
3
3
3
3
3
33
33 33 3
333333 3 33333333 333
3
3
333 3 33333
3
3
3
3
3
3 333 3
33 3333 33333 3
3
1
3 33
33 33 33
33
3
33
333
3333
3 33 333333333 33
33
3
333 3 33
33
3333
33
3333
333333
33
33
1111
333333
33
33 333
3
3
33
33
333
3
3
3 33 3 33 33
33
3 3
111
3
3
3
333 33 3333
3333
3 33333 333
3
3
33 333333
3
3
333
3
3
3
3
333 333
3
1111
33 33
3 33
333 33 3 3 3
3 33 33 333
33 33
3333333
33333 3
333 3
3 33
3
11
33 333 3
33 33
33 3
3333
3333333 333 3333333 3
33
3
111
21
3
3 3 33 333333
33 3
3
3 333333
3
3
3
3
3
3
3
3
3 3 3 333
33 3
112 2
333
333 3333
3 33
3
3 333
3 3
3
2
33 33
3 3 3 3 3 3 3333
3
22 2
3
3
2
33
3
2 2 22
2
3
2 1222
2 222 2222222
2 2
3
2
2
3
3
2
2
2
3
2
2
2
2
2
1
2
2
2
3
2
2 222 221
2 2 2222 222 22
2 22 2
33
2222222222 11
221
2 2
22 2222 222 2222
3
2222
2 2 222 222222
22222222 2 1
12 2
2222 2
22
2
33
2 2 2 22222
2
22
2
22
2
2222
3
2
22
22 2222222 2221
2
2
2
2
2
2
2
2
2
2
2
1
2
2
2
3
2
2
2
2
3
2
2
2
2
1
2
2 2 22
2 22222222222 222222 2 2 2 1
11122
2 2
3233
2 2 2222222
22222222
22222222222222222222 22
2222222
22
22 2
2
222
2
22
23
2 2 222222
3
2222222222222222222222222222
1222222222 22222 2 22 222 222
22
222222222222222222
222
22222
22222 222222
2 2222
222
222
2 22222222 222 22 2222222
2
3333 11111
2222222 33
22 2
2222
222
2222
22
2222
222222
22 2
2
2
2
2
222222
2
2
2
2
1
2
2
2
2
21
2 2222 22233
2
22 222222222222222
11
2
1
22 2 33
2222 22 222
2 2
2 2 2222 222 2 22 22323
1
1
32 2 2 2 2 1111
1
2
2 2 2 22
1
1 11
22 2222
222
2
2
2
1
11 1
1
22 22 22 3 2
1 11
1
2 2 2 2 2
1 1 1 1 11 11 1
1 1
11
1 111 1111 111
2 2
1 1 1 11 2
1 111 11
11 111111
11 1111 111111
33
11
1 1 1
111
1 1 11111
11 1111 11111
1
1
1
3
1
11
1
3
11
1 111
1 1
1 1
111111111
3
1
1
1
3
1
1
1
1
1111 1 111111 11 1
11
1 11 1 1 1111
3313 1
1111
1 1 1111 11111 1
11111
1 111 1
1
3
111
11
3
1
1
1
1
1
1
1
1
1
1
1
1
1
1
3
1 1 11 1 1 1
1111111 1111111 1 111131
111111111111
11
1111
1
1
1
3
111
1
1111 11111 11 13
111
1 11
11 1
3
1111
1
111
111
3
1
1
1
1
1
1
1
1
1
1
1
3
1
1
111111
1111 1 13 1 1
111
1 111111 1111111111
1
111
1
1
1
1
1
1
1
1
1
1
1
1
1 1
1 1111
1
111111111
1
1111 3
1
1
1
1
1
1
1111
1
1
1
1
1
1
1
11111
11 1
111
113
1 1111111111 1111
131 1
1
1
1
11
1
1
111
1
1 1 1 11
11 1
3
1
1
1
1
1
1
1
1
1
1
1
1 1
11
31 1 11
11 3
3
1 111
1
1
1
111
1 1
3
1
1
1 1
1
33 1 1 1
11
133 1 1
1
111
11
111
1.0
3
1.0
0.5
X2
X2
Linear Regression
0.0
0.5
0.0
0.0
0.2
X0.4
1
0.6
0.8
1.0
FIGURE
4.3.
eects
of masking on linear
FIGURE
4.2.
data is
come
from three masked
classes
in by
IR2 others
and
areThe
easily
separated
Masking
aThe
class
completely
problem.
The rug found
plot atbythe
base indicates the p
by linear decision boundaries. The right plot shows
the boundaries
linear
each observation.
three
curves in each pan
discriminant analysis. The left plot shows the boundaries
found byThe
linear
regresLinear Classification
9 for
CSL465/603 - Machine
Learning
three-class
variables;
sion of the indicator response variables.
The middle
class indicator
is completely
maskedfor example,
x = ( = )
= x =
(x)
Let = = C
Then prediction task becomes
arg max x = C
C
Linear Classification
10
= ~ C ,
1
1
P<
=
exp
x C
C
h
<
2
9
9
2
Linear Classification
11
Linear Classification
12
" , "
:
";< ,
estimate the
:Q
,
:
kC =
the proportion of observations in class
kC - sample mean for points in class
m - pooled sample covariance matrix
q
:
1
m =
6 6 x 7 [ O x 7 [
C;< ";<,op ;C
Discriminant function
1
O
P<
s
C x = x kC C P<kOC + log kC
2
Linear Classification
13
Linear Classification
14
+
+
13
3
33
3
3
3
2 2
13
2
1 1
3
33
3 3
3 22
2
1
3 3
2
3
33
11 23 33 1
22 2 2
2
3
2
1 1 1 1 22
13
3
2
1 31 1
3
1 11
2 22
11
22
1
1
2
1
2
1
1
2
1
1 2
1
2
2
2
1 3
33
109
FIGURE 4.5. The left panel shows three Gaussian distributions, with the same
covariance and dierent means. Included are the contours of constant density
enclosing 95% of the probability in each case. The Bayes decision boundaries
between each pair of classes are shown (broken straight lines), and the Bayes
three -classes
are the thicker solid lines (a subset
Linear decision
Classification boundaries separating all
CSL465/603
Machine Learning
15
16
Logistic Regression
Linear regression results in poorly fit models for
classification
Given " 0,1 , we want the output to be also in
the range [0, 1].
Use the logistic (sigmoid) function
x =
<
<vw xyz
Linear Classification
17
<
S{|}~"w
and = 0.7
Linear Classification
18
Predict class 0, if
x < 0.5
Linear Classification
19
Predict = 1 if
3 + < + 9 0
How to estimate w
Linear Classification
20
Linear Classification
21
Linear Classification
log x
, if = 1
22
Minimize w
Linear Classification
23
= 6 x7 "
";<
:
= O
Linear Classification
24
:=
f ()
Logistic
NewtonThis method has Regression
a natural interpretation in-which
we can think of it as
approximating the function f via a linear function that is tangent to f at
Raphson
Method
(1)
the current guess , solving
for where that
linear function equals to zero, and
letting the next guess for be where that linear function is zero.
Newtons
method
- finding
a zero
of a function
Heres a picture
of the Newtons
method
in action:
50
50
50
40
40
40
30
30
30
f(x)
60
f(x)
60
f(x)
60
20
20
20
10
10
10
10
1.5
2.5
3
x
3.5
4.5
10
1.5
2.5
3
x
3.5
4.5
10
1.5
2.5
3
x
3.5
4.5
In the leftmost figure, wesee
f plotted along with the line
the
function
one more iteration, which the updates to about 1.8. After a few more
Linear Classification
25
CSL465/603 - Machine Learning
iterations, we rapidly approach
= 1.3.
9 (w)
" =
"
= 6 x7 1 x7 " "
";<
= O
Where is a diagonal matrix with the diagonal
element- x7 1 x7
""
Parameter update
w v< = w P< (w)
Linear Classification
26
P<
27
28
Linear Classification
op
1 x7
<Pop
29
Linear Classification
30
Linear Classification
31
Summary
Linear regression as a classifier
Masking
Linear classifiers
Linear discriminants
Logistic regression
Sigmoid function
Loss function
Iterative parameter update
Maximum likelihood estimate
Linear Classification
32