Sei sulla pagina 1di 3

CSE 474/574 Introduction to Machine Learning

Fall 2011 Assignment 3


Due date: November 8, 2011
Total points: 100

The assignment is due at the beginning of the class on the above mentioned due
date. Submit a hard copy in class. Attach print out of code, figures/results of any coding
questions in your hard copy along with answers to other questions.

1. Gaussian Density Models


Consider the following two-dimensional data sampled from two categories, denotes Ck .
Each of them come from a Gaussian distribution p(x|Ck ) ∼ N (µk , Σk ):

C1 C2
Sample x1 x2 x1 x2
1 1.7 3.0 1.1 -1.3
2 1.2 3.5 0.2 3.5
3 1.5 5.0 1.5 4.3
4 0.6 4.5 3.2 0.8
5 0.5 4.0 4.0 2.7

(a) Write a Matlab program to find the unbiased maximum-likelihood estimation of


µk and Σk for each class of the given data. [6]
(b) Generate 10 data points for each class from your estimated density and plot your
data on a single figure with distinguished markers (e.g. ‘x’ for one class and ‘o’ for
the other). You need to save your generated data for following problem 2 and 3.
Hint: You can use mvnrnd function to get samples from multivariate Gaussian [6]
density.
(c) Generate a third class C3 of data which contains 10 data points with a same
covariance Σ1 as C1 , and a different mean. Plot both the C1 and C3 data on a
single figure with distinguished markers. Save your data for problem 4. [6]

2. Least Squares for Classification


In regression, we saw that minimization of a sum-of-squares error function led to a
simple closed-form solution for the parameters. When we tempting to apply the same
formalism to classification problem, we consider a a 1-of-K binary coding scheme for
the target vector t (in our case, K = 2). Each class Ck is described by its own linear
model yk (x). The least-squares approach gives an exact close-form solution for the
parameters of discriminant boundary when y1 (x) = y2 (x).
We can write them together using vector notation:

y(x) = X
eWf

1
Where W f is a (D + 1) × K matrix whose kth column is a (D + 1) × 1 weight vector
e is the augmented input vector of size N × (D + 1).
for the Ck class. X

(a) Write a Matlab program to find the least squares solution of W


f using the data
point generated in question 1b. [12]
(b) Using the optimal weight you have obtained from 2a, calculate yk (x) and assign
a class label to each data point according to the discriminant function. Plot your
discriminant boundary on the same figure of the marked data points. [12]
Hint: For our 2-D two class data, the optimal weight W f is a 3 × 2 matrix. The
discriminant boundary is a linear function and can be plotted using Matlab’s ez-
plot function by:
fh = @(x) -(W(2,1)-W(2,2))/(W(3,1)-W(3,2))*x-(W(1,1)-W(1,2))/(W(3,1)-W(3,2));
ezplot(fh);
(c) Calculate the error rate (rate for misclassified samples) for the Least Squares
method you have done and comment on the result. [6]

3. Fisher’s Linear Discriminant


In Fisher linear discriminant analysis we find w such that when each data point is
projected onto a line, the projected class means are ‘maximally’ separated while the
variance within each class is minimized. Then the projected points can be separate by
a simple threshold.

(a) Write a Matlab program to calculate w using the same data you have generated
in 1b. Plot the projection line which is on the direction of w (as a line passing
(0,0) on the same figure of the marked data points. [12]
(b) Find the values of projected 1-D points y(x) for both classes. To construct a
discriminant, assume the class-conditional densities p(y|Ck ) using 1-D Gaussian
distribution by maximum likelihood. Then when set ln p(y|C1 ) = ln p(y|C2 ) with
an assumption of equal prior, we can found a threshold y0 . Derive the solution to
y0 and classify each data point as belonging to C1 if y(x) ≥ y0 and classify it as
belonging to C2 otherwise. [12]
(c) Calculate the error rate (rate for misclassified samples) and comment on the result.
[6]

4. Generative Model for Classification


A generative approach for separating two classes (for example in 3c) is to model the
class conditional densities p(x|Ck ) as well as the class prior p(Ck ), then use these to
compute posterior probabilities of the class p(Ck |x) To classify each point x optimally
(in the sense of minimizing the expected classification error) we must assign it to the
class Ck that maximizes the posterior probability.

(a) Using equation (4.57) and (4.58) form the textbook, derive the result (4.65) for
the posterior class probability in the two-class generative model with Gaussian
densities, and verify the results (4.66) and (4.67) for the parameters w and w0. [6]
(b) Comment on the discriminative function (decision boundary) in case of all two
classes share the same covariance matrix. [4]

2
(c) Evaluate the discriminative function in Matlab using the 2-D data you have gen-
erated in 1c by computing parameters w as a 21 vector and w0 as a scaler in
(4.66) and (4.67). Plot and verify the decision boundary on the sample figure of
the data points. [6]

5. Logistic Regression
Show that for a linearly separable data set, the maximum likelihood solution for the
logistic regression model is obtained by finding a vector w whos decision boundary
wT φ(x) = 0 separates the classes and then taking the magnitude of w to infinity. [6]

Potrebbero piacerti anche