Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Instructions
1. This exam is open book. However, computers, mobile phones and other handheld devices
are not allowed.
2. Notation - bold symbols are vectors, capital bold symbols are matrices and regular symbols
are scalars.
4. Finding the numerical answer with a calculator does not carry any credit. Just write the
answer upto the numerical solution.
5. When using extra sheets, mark the question number clearly. Do not strike off even if you
think you are wrong. Return all extra sheets used.
Name - ................................
Dept. - ....................
SR Number - ....................
1. Estimation from Noisy Data Let v be a real noisy measurement of a real random
variable u, i.e., v = u + ǫ, where ǫ represents a zero mean random variable with variance
σǫ2 . Let Ruu denote the second moment of u, E[u2 ]. From the noisy measurement v, the
aim is to find an estimate of u of the form û = av. Let e = u − û denote the estimation
error.
(a) Compute the value of a which minimizes the mean square error E[e2 ] (Points 5)
(b) Find the lowest possible value of mean square error E[e2 ]. What happens to the best
estimate and the lowest error when σǫ2 → ∞ (Points 5)
2. Missing Data Reconstruction Recently, NASA’s Kepler identified a habitable planet
of earth-size. The observations from the planet are made periodically and let this be
denoted as x. With a set of measurements, the engineers at NASA modeled the data
using a Gaussian mixture model (GMM) λ.
M
X
λ is αm N (x, µm , Σm )
m=1
(a) Amar who works at NASA says that he can find the expression for the distribution
of p(xh |xo , λ). He also claims that he can find the estimate of missing data xh as
the conditional expectation Exh |xo ,λ [xh ] where E[.] denotes the expectation operator.
What would be your answer if you were Amar. Simplify whenever possible. (Points
10)
(b) Amar further suggests that he can find the estimate in the Maximum likelihood sense
using an iterative algorithm to solve for best estimate of the missing data xh . How
would you formulate the algorithm if you were Amar. (Points 15)
3. Correlation Analysis - Tamara works in a local hospital where vital measuruments
(pressure, heart rate, sugar levels etc) in intensive care unit (ICU) are available for N
patients. Let X = {x1 , x2 , .., xN } denote these measurements where xi is a D dimen-
sional vector. In the hospital database, she also identifies personal information about
these patients (age, gender, smoking habits etc). Let Y = {y 1 , y 2 , .., y N } denote this
data where y i is a R dimensional vector. Having found this data, she wants to convince
her boss that there is correlation between the measurements X and personal information
Y by visualizing the joint dataset in two dimensions (pi , qi ). To this end, she formulates
her goal as learning two projection vectors wx and wy of dimension D and R respec-
tively such that the correlation coefficient ρ of the projections (pi , qi ) is maximized. The
correlation coefficient is defined as,
pT q
ρ= p p
pT p q T q
where p = [p1 , p2 , ...pN ]T and q = [q1 , q2 , ...qN ]T are N dimensional vectors containing the
projected points pi = wTx (xi − µx ) and qi = wTy (y i − µy ), with µx , µy being the sample
mean of X and Y respectively. How would you help Tamara in finding projection vectors
wx and wy which maximize ρ ? (Points 20)
4. Matrix Factorization - While doing her NLP course, Sonia derives features which are all
positive from a text document. Let v denote this postive D dimensional feature (vi ≥ 0).
She wants to assume a matrix factorization model that involves the determization of
positive factor matrix W of size (D × Q) and a positive latent vector h of dimension
Q such that v̂ = Wh approximates the feature vector v where [W ]ij ≥ 0, hj ≥ 0. In
estimating the factor matrix and the latent vector, the objective function she chooses to
minimize is the divergence given by,
X vi
D(v||v̂) = vi log − vi + v̂i
v̂i
i
In using the above objective function to solve for W and h with the constraints of positive
factor matrix and latent vector, she finds that the optimization is not convex. Bharat,
who has done the MLSP course, suggests that the principles of EM algorithm can be
applied to this problem. In particular, with a prior choice of positive factor matrix W,
he comes up with the G function to estimate the vector h,
X X X Wij htj Wij htj
G(h, ht ) = (vi logvi − vi ) + Wij hj − vi P t log W h
ij j − log P t ,
i ij ij k Wik hk k Wik hk
where summation over variable i ranges from (1...D) and j, k ranges from (1...Q).
Show that the G function is an auxillary function for the objective function F , i.e.
G(h, h) = F (h) and F (h) ≤ G(h, ht ). Argue that iteratively minimizing G will
achieve the minimization of the objective function. (Points 15)
(b) Show that the iterative update rule for h is given by,
htj X v
ht+1
j =P P i t Wij
i Wij k Wik hk
i
(Points 5)