Sei sulla pagina 1di 18

Expectation Maximization

-
Introduction to EM algorithm

TLT-5906 Advanced Course in Digital Transmission

Jukka Talvitie, M.Sc. (eng)


jukka.talvitie@tut.fi
Department of Communication Engineering
Tampere University of Technology

M.Sc. Jukka Talvitie 5.12.2013


Outline
q Expectation Maximization (EM) algorithm
– Motivation, background
– Where the EM can be used?
q EM principle
– Formal definition
– How the algorithm really works?
– Coin toss example
– About some practical issues
q More advanced examples
– Line fitting with EM algorithm
– Parameter estimation of multivariate Gaussian mixture
q Conclusions

M.Sc. Jukka Talvitie 5.12.2013


Motivation

q Consider classical line fitting problem:


– Assume below measurements of a linear model y=ax+b+n (here
the line parameters are a and b and n is zero mean noise)

2.2
Measurements

2.1

1.9

1.8

1.7

1.6

1.5
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

M.Sc. Jukka Talvitie 5.12.2013


Motivation

q We use LS (Least Squares) to find the best fit:


q Is this the best solution?

2.3
Measurements
2.2 LS

2.1

1.9

1.8

1.7

1.6

1.5
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

M.Sc. Jukka Talvitie 5.12.2013


Motivation

q LS would be the Best Linear Unbiased Estimator, if the noise


would be uncorrelated with fixed variance
q Here, actually the noise term is correlated and the actual linear
model of this realization can be seen below as the black line
– Here the LS gives too much weight for a group of samples in the
middle
2.3
Measurements
2.2 LS
Correct line

2.1

1.9

1.8

1.7

1.6

1.5
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

M.Sc. Jukka Talvitie 5.12.2013


Motivation
q Taking the correlation of the noise term into account, we can use Generalized LS
method and the result can be improved considerably
q However, in many cases we do not know the correlation model
– It is hided in the observations and we cannot access it directly
– Therefore, e.g. here we would need to estimate simultaneously the
covariance and the line parameters
q This sort of problems might quite quickly become
very complicated 2.3
Measurements
– How to estimate the covariance without 2.2 LS
Correct line
knowing the line parameters and vice versa? 2.1 Generalized LS

q Intuitive (heuristic) solution:


2
– Iteratively estimate the other parameter, and
then the other, and continue… 1.9

– No guarantee for the performance in this 1.8

case (e.g. compared to maximum likelihood 1.7

(ML) solution)
1.6

q The EM algorithm provides the ML solution for


1.5
these sort of problems 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

M.Sc. Jukka Talvitie 5.12.2013


Expectation Maximization
Algorithm
q Presented by Dempster, Laird and Rubin in [1] in 1977
– Basically the same principle was already proposed earlier by
some other authors in specific circumstances
q EM algorithm is an iterative estimation algorithm that can derive
the maximum likelihood (ML) estimates in the presence of
missing/hidden data (“incomplete data”)
– e.g. the classical case is the Gaussian mixture, where we have
a set of unknown Gaussian distributions (see example later on)
Many-to-one mapping [2]
X: underlying space
x: complete data (required for ML)
Y: observation space
y: observation

x is observed only by means of y(x).


X(y) is a subset of X determined by y.

M.Sc. Jukka Talvitie 5.12.2013


Expectation Maximization
Algorithm
q The basic functioning of the EM algorithm can be divided into two
steps (the parameter to be estimated is θ):
– Expectation step (E-step)
• Take the expected value of the complete data given the
observation and the current parameter estimate qˆ
k

{
Q(q , qˆk ) = E log f (x | q ) | y, qˆk }
– Maximization step (M-step)
• Maximize the Q-function in the E-step (basically, the data of
the E-step is used as it were measured observations)

qˆk +1 = arg max Q (q | qˆk )


q

q The likelihood of the parameter is increased at every iteration


– EM converges towards some local maximum of the likelihood
function
M.Sc. Jukka Talvitie 5.12.2013
An example: ML estimation vs. EM
algorithm [3]
q We wish to estimate the variance of S:
– observation Y=S+N
• S and N are normally distributed with zero means and
variances θ and 1, respectively
– Now, Y is also normally distributed (zero mean with variance θ+1)
q ML estimate can be easily derived:

qˆML = arg max( p( y | q ))


q

M
= max{0, y 2 - 1}

q The zero in above result becomes from the fact that we know that
the variance is always non-negative

M.Sc. Jukka Talvitie 5.12.2013


An example: ML estimation vs. EM
algorithm
q The same with the EM algorithm
– complete data is now included in S and N
– E-step is then:

Q(q , qˆk ) = E éëln p( s, n | q ) | y, qˆk ùû


– the logarithmic probability distribution for the complete data is
then
ln p ( s, n | q ) = ln p( n) + ln( p( s | q ))
1 S2
= C - ln q - (C contains all the terms
2 2q
independent of θ)
®

ˆ 1 E[ S 2
| Y , qˆk ]
Q(q , q k ) = C - ln q -
2 2q

M.Sc. Jukka Talvitie 5.12.2013


An example: ML estimation vs. EM
algorithm
q M-step:
– maximize the E-step
– We set the derivative to zero and get (use results from math
tables: conditional means and variances, “Law of total variance”)

qˆk +1 = E éë S 2 | Y , qˆk ùû = E 2 éë S | Y , qˆk ùû + var éë S | Y , qˆk ùû


2
æ qˆk ö qˆk
= çç Y ÷÷ +
ˆ ˆ
è qk + 1 ø qk + 1

q At the steady state (qˆk +1 = qˆk ) we get the same value for the
estimate as in ML estimation (max{0,y2-1})
q What about the convergence? What if we choose the initial value
qˆ = 0
0

M.Sc. Jukka Talvitie 5.12.2013


An example: ML estimation vs. EM
algorithm
q In the previous example, the ML estimate could be solved in a
closed form expression
– In this case there was no need for EM algorithm, since the ML
estimate is given in a straightforward manner (we just showed
that the EM algorithm converges to the peak of the likelihood
function)

q Next we consider a coin toss example:


– The target is to figure out the probability of heads for two coins
– ML estimate can be directly calculated from the results
q We will raise the bets a little bit higher and assume that we don’t
even know which one of the coins is used for the sample set?
– i.e. we are estimating the coin probabilities without knowing
which one of the coins is being tossed

M.Sc. Jukka Talvitie 5.12.2013


An example: Coin toss) [4]
Maximum likelihood Coin A Coin B
ML method (if we
q We have two coins: A and B HTTTHHTHTH 5H, 5T
know the coins):
q The probabilities for heads are q A HHHHTHHHHH 9H, 1T
πˆA <
24
< 0.80
and q B HTHHHHHTHH 8H,2T
24 ∗ 6
9
πˆB < < 0.45
q We have 5 measurement sets HTHTTTTHHTT 4H, 6T 9 ∗ 11

including 10 coin tosses in each set THHHTHHHTH 7H,3T Example calculations for the
first set (qˆA(0) = 0.6, qˆB(0) = 0.5)
q If we know which of the coins are 24H, 6T 9H, 11T æ10 ö
5 sets, 10 tosses per set ç ÷ × 0.6 × 0.4 » 0.201
5 5

tossed in each set, we can w i th è5ø

calculate the ML probabilities for q A


Expectation Maximization liz
e”
2.2
37 æ10 ö
ç ÷ × 0.5 × 0.5 » 0.246
5 5
m a =
or 6 è5ø
N
and q B
2. E-step ” 1 4
0.2
0 1+
? HTTTHHTHTH
0.2

q If we don’t know which of the coins ?


?
?
HHHHTHHHHH
HTHHHHHTHH
HTHTTTTHHTT
Coin A Coin B

are tossed in each set, ML ? THHHTHHHTH 0.45 x 0.55 x ≈2.2H, 2.2T ≈2.8H, 2.8T

estimates cannot be calculated 0.80 x 0.20 x ≈7.2H, 0.8T ≈1.8H, 0.2T

directly 0.73 x 0.27 x ≈5.9H, 1.5T ≈2.1H, 0.5T

→ EM algorithm 0.35 x 0.65 x ≈1.4H, 2.1T ≈2.6H, 3.9T

0.65 x 0.35 x ≈4.5H, 1.9T ≈2.5H, 1.1T


Binomial distribution
used to calculate πˆA(0) < 0.6 ≈21.3H, 8.6T ≈11.7H, 8.4T
21.3
probabilities: πˆB(0) < 0.5 πˆA(1) < < 0.71
ænö k n -k
21.3 ∗ 8.6
ç k ÷ p (1 - p ) H
1. Initialization 3. M-step
11.7 πˆA(10) < 0.80
è ø πˆB(1) < < 0.58 4.
11.7 ∗ 8.4 πˆB(10) < 0.52

M.Sc. Jukka Talvitie 5.12.2013


About some practical issues

q Although many examples in the literature are showing excellent


results using the EM algorithm, the reality is often less glamorous
– As the number of uncertain parameters increase in the modeled
system, even the best available guess (in ML sense) might not
be adequate
– NB! This is not the algorithm’s fault. It still provides the best
possible solution in ML sense
q Depending on the form of the likelihood function (provided in the
E-step) the convergence rate of the EM might vary considerably
q Notice, that the algorithm converges towards a local maximum
– To locate the global peak one must use different initial guesses
for the estimated parameters or use some other more advanced
methods to find out the global peak
– With multiple unknown (hidden/latent) parameters the number of
local peaks usually increases
M.Sc. Jukka Talvitie 5.12.2013
Further examples
q Line Fitting (showed only in the lecture)
q Parameter estimation of multivariate Gaussian mixture
– See additional pdf-file for the
• Problem definition
• Equations
– Definition of the log-likelihood function
– E-step
– M-step
– See additional Matlab m-file for the illustration of
• The example in numerical form
– Dimensions and value spaces for each parameter
• The iterative nature of the EM algorithm
– Study how parameters change at each iteration
• How initial guesses for the estimated parameters affect the final
result
M.Sc. Jukka Talvitie 5.12.2013
Conclusions
q EM finds iteratively ML estimates in estimation problems with hidden
(incomplete) data
– likelihood increases at every step of the iteration process
q Algorithm consists of two iteratively taken steps:
– Expectation step (E-step)
• Take the expected value of the complete data given the
observation and the current parameter estimate
– Maximization step (M-step)
• Maximize the Q-function in the E-step (basically, the data of the
E-step is used as it were measured observations)
q Algorithm converges to the local maximum
– Global maximum can be elsewhere
q See reference list for literature regarding use cases of EM algorithm in
the Communications
– These are the references [5]-[16] (not mentioned in the previous slides)

M.Sc. Jukka Talvitie 5.12.2013


References
1. Dempster, A.P.; Laird, N.M.; Rubin, D.B., “Maximum Likelihood from Incomplete Data via the EM Algorithm,” Journal
of the Royal Statistical Society, Series B (Methodological), Vol. 39, No. 1., pp. 1-38, 1977.
2. Moon, T.K., “The Expectation Maximization Algorithm”, IEEE Signal Processing Magazine, vol. 13, pp. 47-60, Nov.
1996.
3. Chuong, B.D.; Serafim B., What is Expectation Maximization algorithm? [Online]. Not available anymore. Was
originally available on: courses.ece.illinois.edu/ece561/spring08/EM.pdf
4. The Expectation-Maximization Algorithm. [Online]. Not available anymore. Was originally available on:
ai.stanford.edu/ ~chuongdo/papers/em_tutorial.pdf

Some communications related papers using the EM algorithm (continues in the next slide):
5. Borran, M.J.; Nasiri-Kenari, M., "An efficient detection technique for synchronous CDMA communication systems
based on the expectation maximization algorithm," Vehicular Technology, IEEE Transactions on , vol.49, no.5,
pp.1663,1668, Sep 2000
6. Cozzo, C.; Hughes, B.L., "The expectation-maximization algorithm for space-time communications," Information
Theory, 2000. Proceedings. IEEE International Symposium on , vol., no., pp.338,, 2000
7. Rad, K. R.; Nasiri-Kenari, M., "Iterative detection for V-BLAST MIMO communication systems based on expectation
maximisation algorithm," Electronics Letters , vol.40, no.11, pp.684,685, 27 May 2004
8. Barembruch, S.; Scaglione, A.; Moulines, E., "The expectation and sparse maximization algorithm," Communications
and Networks, Journal of , vol.12, no.4, pp.317,329, Aug. 2010
9. Panayirci, E., "Advanced signal processing techniques for wireless communications," Signal Design and its
Applications in Communications (IWSDA), 2011 Fifth International Workshop on , vol., no., pp.1,1, 10-14 Oct. 2011
10. O'Sullivan, J.A., "Message passing expectation-maximization algorithms," Statistical Signal Processing, 2005
IEEE/SP 13th Workshop on , vol., no., pp.841,846, 17-20 July 2005
11. Etzlinger, Bernhard; Haselmayr, Werner; Springer, Andreas, "Joint Detection and Estimation on MIMO-ISI Channels
Based on Gaussian Message Passing," Systems, Communication and Coding (SCC), Proceedings of 2013 9th
International ITG Conference on , vol., no., pp.1,6, 21-24 Jan. 2013

M.Sc. Jukka Talvitie 5.12.2013


References
12. Groh, I.; Staudinger, E.; Sand, S., "Low Complexity High Resolution Maximum Likelihood Channel Estimation in
Spread Spectrum Navigation Systems," Vehicular Technology Conference (VTC Fall), 2011 IEEE , vol., no., pp.1,5,
5-8 Sept. 2011
13. Wei Wang; Jost, T.; Dammann, A., "Estimation and Modelling of NLoS Time-Variant Multipath for Localization
Channel Model in Mobile Radios," Global Telecommunications Conference (GLOBECOM 2010), 2010 IEEE , vol.,
no., pp.1,6, 6-10 Dec. 2010
14. Nasir, A.A.; Mehrpouyan, H.; Blostein, S.D.; Durrani, S.; Kennedy, R.A., "Timing and Carrier Synchronization With
Channel Estimation in Multi-Relay Cooperative Networks," Signal Processing, IEEE Transactions on , vol.60, no.2,
pp.793,811, Feb. 2012
15. Tsang-Yi Wang; Jyun-Wei Pu; Chih-Peng Li, "Joint Detection and Estimation for Cooperative Communications in
Cluster-Based Networks," Communications, 2009. ICC '09. IEEE International Conference on , vol., no., pp.1,5, 14-
18 June 2009
16. Xie, Yongzhe; Georghiades, C.N., "Two EM-type channel estimation algorithms for OFDM with transmitter diversity,"
Acoustics, Speech, and Signal Processing (ICASSP), 2002 IEEE International Conference on , vol.3, no., pp.III-
2541,III-2544, 13-17 May 2002

M.Sc. Jukka Talvitie 5.12.2013

Potrebbero piacerti anche