Sei sulla pagina 1di 8

E9 205 – Machine Learning For Signal Processing

Practice For Midterm Exam # 1


Date: Sept. 22, 2019

Instructions

1. This exam is open book. However, computers, mobile phones and other handheld devices
are not allowed.

2. Notation - bold symbols are vectors, capital bold symbols are matrices and regular symbols
are scalars.

3. Answer all questions.

4. Finding the numerical answer with a calculator does not carry any credit. Just write the
answer upto the numerical solution.

5. When using extra sheets, mark the question number clearly. Do not strike off even if you
think you are wrong. Return all extra sheets used.

6. Total Duration - 60 minutes

7. Total Marks - 75 points

Name - ................................

Dept. - ....................

SR Number - ....................
1. Estimation from Noisy Data Let v be a real noisy measurement of a real random
variable u, i.e., v = u + ǫ, where ǫ represents a zero mean random variable with variance
σǫ2 . Let Ruu denote the second moment of u, E[u2 ]. From the noisy measurement v, the
aim is to find an estimate of u of the form û = av. Let e = u − û denote the estimation
error.

(a) Compute the value of a which minimizes the mean square error E[e2 ] (Points 5)

(b) Find the lowest possible value of mean square error E[e2 ]. What happens to the best
estimate and the lowest error when σǫ2 → ∞ (Points 5)
2. Missing Data Reconstruction Recently, NASA’s Kepler identified a habitable planet
of earth-size. The observations from the planet are made periodically and let this be
denoted as x. With a set of measurements, the engineers at NASA modeled the data
using a Gaussian mixture model (GMM) λ.
M
X
λ is αm N (x, µm , Σm )
m=1

where x is of D dimensions and the covariance Σm is assumed to be diagonal. The model


parameters were estimated using MLE. After the initial set of measurements and the
model learning, some of the sensors failed and some data dimensions were missing i.e.,
only d parameters could be measured reliably while the rest of D − d parameters could
not observed. Let x = [xTo xTh ]T where xo denotes d reliable observations and xh denotes
the missing observations. The engineers are now tasked with finding the best estimate of
xh given the GMM and the reliable observations xo .

(a) Amar who works at NASA says that he can find the expression for the distribution
of p(xh |xo , λ). He also claims that he can find the estimate of missing data xh as
the conditional expectation Exh |xo ,λ [xh ] where E[.] denotes the expectation operator.
What would be your answer if you were Amar. Simplify whenever possible. (Points
10)
(b) Amar further suggests that he can find the estimate in the Maximum likelihood sense
using an iterative algorithm to solve for best estimate of the missing data xh . How
would you formulate the algorithm if you were Amar. (Points 15)
3. Correlation Analysis - Tamara works in a local hospital where vital measuruments
(pressure, heart rate, sugar levels etc) in intensive care unit (ICU) are available for N
patients. Let X = {x1 , x2 , .., xN } denote these measurements where xi is a D dimen-
sional vector. In the hospital database, she also identifies personal information about
these patients (age, gender, smoking habits etc). Let Y = {y 1 , y 2 , .., y N } denote this
data where y i is a R dimensional vector. Having found this data, she wants to convince
her boss that there is correlation between the measurements X and personal information
Y by visualizing the joint dataset in two dimensions (pi , qi ). To this end, she formulates
her goal as learning two projection vectors wx and wy of dimension D and R respec-
tively such that the correlation coefficient ρ of the projections (pi , qi ) is maximized. The
correlation coefficient is defined as,

pT q
ρ= p p
pT p q T q

where p = [p1 , p2 , ...pN ]T and q = [q1 , q2 , ...qN ]T are N dimensional vectors containing the
projected points pi = wTx (xi − µx ) and qi = wTy (y i − µy ), with µx , µy being the sample
mean of X and Y respectively. How would you help Tamara in finding projection vectors
wx and wy which maximize ρ ? (Points 20)
4. Matrix Factorization - While doing her NLP course, Sonia derives features which are all
positive from a text document. Let v denote this postive D dimensional feature (vi ≥ 0).
She wants to assume a matrix factorization model that involves the determization of
positive factor matrix W of size (D × Q) and a positive latent vector h of dimension
Q such that v̂ = Wh approximates the feature vector v where [W ]ij ≥ 0, hj ≥ 0. In
estimating the factor matrix and the latent vector, the objective function she chooses to
minimize is the divergence given by,
X vi
D(v||v̂) = vi log − vi + v̂i
v̂i
i

In using the above objective function to solve for W and h with the constraints of positive
factor matrix and latent vector, she finds that the optimization is not convex. Bharat,
who has done the MLSP course, suggests that the principles of EM algorithm can be
applied to this problem. In particular, with a prior choice of positive factor matrix W,
he comes up with the G function to estimate the vector h,
X X X Wij htj  Wij htj 
G(h, ht ) = (vi logvi − vi ) + Wij hj − vi P t log W h
ij j − log P t ,
i ij ij k Wik hk k Wik hk

where summation over variable i ranges from (1...D) and j, k ranges from (1...Q).

(a) For the estimation of h, the objective function then is to minimize,


X vi X
F (h) = vi log P − vi + Wij hj
i j Wij hj j

Show that the G function is an auxillary function for the objective function F , i.e.
G(h, h) = F (h) and F (h) ≤ G(h, ht ). Argue that iteratively minimizing G will
achieve the minimization of the objective function. (Points 15)
(b) Show that the iterative update rule for h is given by,

htj X v
ht+1
j =P P i t Wij
i Wij k Wik hk
i

(Points 5)

Potrebbero piacerti anche