EM Presentation 2013

Expectation Maximization
-
Introduction to EM algorithm
TLT-5906 Advanced Course in Digital Transmission
Jukka Talvitie, M.Sc. (eng)

jukka.talvitie@tut.fi
Department of Communication Engineering
Tampere University of Technology
M.Sc. Jukka Talvitie 5.12.2013

Outline
q Expectation Maximization (EM) algorithm
– Motivation, background
– Where the EM can be used?
q EM principle
– Formal definition
– How the algorithm really works?
– Coin toss example
– About some practical issues
q More advanced examples
– Line fitting with EM algorithm
– Parameter estimation of multivariate Gaussian mixture
q Conclusions

Motivation
q Consider classical line fitting problem:

– Assume below measurements of a linear model y=ax+b+n (here
the line parameters are a and b and n is zero mean noise)
2.2
Measurements
2.1
1.9
1.8
1.7
1.6
1.5
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Motivation
q We use LS (Least Squares) to find the best fit:

q Is this the best solution?
2.3
Measurements
2.2 LS
2.1
1.9
1.8
1.7
1.6
1.5
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Motivation
q LS would be the Best Linear Unbiased Estimator, if the noise

would be uncorrelated with fixed variance
q Here, actually the noise term is correlated and the actual linear
model of this realization can be seen below as the black line
– Here the LS gives too much weight for a group of samples in the
middle
2.3
Measurements
2.2 LS
Correct line
2.1
1.9
1.8
1.7
1.6
1.5
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Motivation
q Taking the correlation of the noise term into account, we can use Generalized LS
method and the result can be improved considerably
q However, in many cases we do not know the correlation model
– It is hided in the observations and we cannot access it directly
– Therefore, e.g. here we would need to estimate simultaneously the
covariance and the line parameters
q This sort of problems might quite quickly become
very complicated 2.3
Measurements
– How to estimate the covariance without 2.2 LS
Correct line
knowing the line parameters and vice versa? 2.1 Generalized LS
q Intuitive (heuristic) solution:

2
– Iteratively estimate the other parameter, and
then the other, and continue… 1.9
– No guarantee for the performance in this 1.8
case (e.g. compared to maximum likelihood 1.7
(ML) solution)
1.6
q The EM algorithm provides the ML solution for

1.5
these sort of problems 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Algorithm
q Presented by Dempster, Laird and Rubin in [1] in 1977
– Basically the same principle was already proposed earlier by
some other authors in specific circumstances
q EM algorithm is an iterative estimation algorithm that can derive
the maximum likelihood (ML) estimates in the presence of
missing/hidden data (“incomplete data”)
– e.g. the classical case is the Gaussian mixture, where we have
a set of unknown Gaussian distributions (see example later on)
Many-to-one mapping [2]
X: underlying space
x: complete data (required for ML)
Y: observation space
y: observation
x is observed only by means of y(x).

X(y) is a subset of X determined by y.

Algorithm
q The basic functioning of the EM algorithm can be divided into two
steps (the parameter to be estimated is θ):
– Expectation step (E-step)
• Take the expected value of the complete data given the
observation and the current parameter estimate qˆ
k
{
Q(q , qˆk ) = E log f (x | q ) | y, qˆk }
– Maximization step (M-step)
• Maximize the Q-function in the E-step (basically, the data of
the E-step is used as it were measured observations)
qˆk +1 = arg max Q (q | qˆk )

q
q The likelihood of the parameter is increased at every iteration

– EM converges towards some local maximum of the likelihood
function
An example: ML estimation vs. EM
algorithm [3]
q We wish to estimate the variance of S:
– observation Y=S+N
• S and N are normally distributed with zero means and
variances θ and 1, respectively
– Now, Y is also normally distributed (zero mean with variance θ+1)
q ML estimate can be easily derived:
qˆML = arg max( p( y | q ))

q
M
= max{0, y 2 - 1}
q The zero in above result becomes from the fact that we know that
the variance is always non-negative

algorithm
q The same with the EM algorithm
– complete data is now included in S and N
– E-step is then:
Q(q , qˆk ) = E éëln p( s, n | q ) | y, qˆk ùû

– the logarithmic probability distribution for the complete data is
then
ln p ( s, n | q ) = ln p( n) + ln( p( s | q ))
1 S2
= C - ln q - (C contains all the terms
2 2q
independent of θ)
®
ˆ 1 E[ S 2
| Y , qˆk ]
Q(q , q k ) = C - ln q -
2 2q

algorithm
q M-step:
– maximize the E-step
– We set the derivative to zero and get (use results from math
tables: conditional means and variances, “Law of total variance”)
qˆk +1 = E éë S 2 | Y , qˆk ùû = E 2 éë S | Y , qˆk ùû + var éë S | Y , qˆk ùû

2
æ qˆk ö qˆk
= çç Y ÷÷ +
ˆ ˆ
è qk + 1 ø qk + 1
q At the steady state (qˆk +1 = qˆk ) we get the same value for the
estimate as in ML estimation (max{0,y2-1})
q What about the convergence? What if we choose the initial value
qˆ = 0
0

algorithm
q In the previous example, the ML estimate could be solved in a
closed form expression
– In this case there was no need for EM algorithm, since the ML
estimate is given in a straightforward manner (we just showed
that the EM algorithm converges to the peak of the likelihood
function)
q Next we consider a coin toss example:

– The target is to figure out the probability of heads for two coins
– ML estimate can be directly calculated from the results
q We will raise the bets a little bit higher and assume that we don’t
even know which one of the coins is used for the sample set?
– i.e. we are estimating the coin probabilities without knowing
which one of the coins is being tossed

An example: Coin toss) [4]
Maximum likelihood Coin A Coin B
ML method (if we
q We have two coins: A and B HTTTHHTHTH 5H, 5T
know the coins):
q The probabilities for heads are q A HHHHTHHHHH 9H, 1T
πA <
24
< 0.80
and q B HTHHHHHTHH 8H,2T
24 ∗ 6
9
πB < < 0.45
q We have 5 measurement sets HTHTTTTHHTT 4H, 6T 9 ∗ 11
including 10 coin tosses in each set THHHTHHHTH 7H,3T Example calculations for the
first set (qˆA(0) = 0.6, qˆB(0) = 0.5)
q If we know which of the coins are 24H, 6T 9H, 11T æ10 ö
5 sets, 10 tosses per set ç ÷ × 0.6 × 0.4 » 0.201
5 5
tossed in each set, we can w i th è5ø
calculate the ML probabilities for q A

Expectation Maximization liz
e”
2.2
37 æ10 ö
ç ÷ × 0.5 × 0.5 » 0.246
5 5
m a =
or 6 è5ø
N
and q B
2. E-step ” 1 4
0.2
0 1+
? HTTTHHTHTH
0.2
q If we don’t know which of the coins ?

?
?
HHHHTHHHHH
HTHHHHHTHH
HTHTTTTHHTT
Coin A Coin B
are tossed in each set, ML ? THHHTHHHTH 0.45 x 0.55 x ≈2.2H, 2.2T ≈2.8H, 2.8T
estimates cannot be calculated 0.80 x 0.20 x ≈7.2H, 0.8T ≈1.8H, 0.2T
directly 0.73 x 0.27 x ≈5.9H, 1.5T ≈2.1H, 0.5T
→ EM algorithm 0.35 x 0.65 x ≈1.4H, 2.1T ≈2.6H, 3.9T
0.65 x 0.35 x ≈4.5H, 1.9T ≈2.5H, 1.1T

Binomial distribution
used to calculate πA(0) < 0.6 ≈21.3H, 8.6T ≈11.7H, 8.4T
21.3
probabilities: πB(0) < 0.5 πA(1) < < 0.71
ænö k n -k
21.3 ∗ 8.6
ç k ÷ p (1 - p ) H
1. Initialization 3. M-step
11.7 πA(10) < 0.80
è ø πB(1) < < 0.58 4.
11.7 ∗ 8.4 πB(10) < 0.52

About some practical issues
q Although many examples in the literature are showing excellent

results using the EM algorithm, the reality is often less glamorous
– As the number of uncertain parameters increase in the modeled
system, even the best available guess (in ML sense) might not
be adequate
– NB! This is not the algorithm’s fault. It still provides the best
possible solution in ML sense
q Depending on the form of the likelihood function (provided in the
E-step) the convergence rate of the EM might vary considerably
q Notice, that the algorithm converges towards a local maximum
– To locate the global peak one must use different initial guesses
for the estimated parameters or use some other more advanced
methods to find out the global peak
– With multiple unknown (hidden/latent) parameters the number of
local peaks usually increases
Further examples
q Line Fitting (showed only in the lecture)
q Parameter estimation of multivariate Gaussian mixture
– See additional pdf-file for the
• Problem definition
• Equations
– Definition of the log-likelihood function
– E-step
– M-step
– See additional Matlab m-file for the illustration of
• The example in numerical form
– Dimensions and value spaces for each parameter
• The iterative nature of the EM algorithm
– Study how parameters change at each iteration
• How initial guesses for the estimated parameters affect the final
result
Conclusions
q EM finds iteratively ML estimates in estimation problems with hidden
(incomplete) data
– likelihood increases at every step of the iteration process
q Algorithm consists of two iteratively taken steps:
– Expectation step (E-step)
• Take the expected value of the complete data given the
observation and the current parameter estimate
– Maximization step (M-step)
• Maximize the Q-function in the E-step (basically, the data of the
E-step is used as it were measured observations)
q Algorithm converges to the local maximum
– Global maximum can be elsewhere
q See reference list for literature regarding use cases of EM algorithm in
the Communications
– These are the references [5]-[16] (not mentioned in the previous slides)

References
1. Dempster, A.P.; Laird, N.M.; Rubin, D.B., “Maximum Likelihood from Incomplete Data via the EM Algorithm,” Journal
of the Royal Statistical Society, Series B (Methodological), Vol. 39, No. 1., pp. 1-38, 1977.
2. Moon, T.K., “The Expectation Maximization Algorithm”, IEEE Signal Processing Magazine, vol. 13, pp. 47-60, Nov.
1996.
3. Chuong, B.D.; Serafim B., What is Expectation Maximization algorithm? [Online]. Not available anymore. Was
originally available on: courses.ece.illinois.edu/ece561/spring08/EM.pdf
4. The Expectation-Maximization Algorithm. [Online]. Not available anymore. Was originally available on:
ai.stanford.edu/ ~chuongdo/papers/em_tutorial.pdf
Some communications related papers using the EM algorithm (continues in the next slide):
5. Borran, M.J.; Nasiri-Kenari, M., "An efficient detection technique for synchronous CDMA communication systems
based on the expectation maximization algorithm," Vehicular Technology, IEEE Transactions on , vol.49, no.5,
pp.1663,1668, Sep 2000
6. Cozzo, C.; Hughes, B.L., "The expectation-maximization algorithm for space-time communications," Information
Theory, 2000. Proceedings. IEEE International Symposium on , vol., no., pp.338,, 2000
7. Rad, K. R.; Nasiri-Kenari, M., "Iterative detection for V-BLAST MIMO communication systems based on expectation
maximisation algorithm," Electronics Letters , vol.40, no.11, pp.684,685, 27 May 2004
8. Barembruch, S.; Scaglione, A.; Moulines, E., "The expectation and sparse maximization algorithm," Communications
and Networks, Journal of , vol.12, no.4, pp.317,329, Aug. 2010
9. Panayirci, E., "Advanced signal processing techniques for wireless communications," Signal Design and its
Applications in Communications (IWSDA), 2011 Fifth International Workshop on , vol., no., pp.1,1, 10-14 Oct. 2011
10. O'Sullivan, J.A., "Message passing expectation-maximization algorithms," Statistical Signal Processing, 2005
IEEE/SP 13th Workshop on , vol., no., pp.841,846, 17-20 July 2005
11. Etzlinger, Bernhard; Haselmayr, Werner; Springer, Andreas, "Joint Detection and Estimation on MIMO-ISI Channels
Based on Gaussian Message Passing," Systems, Communication and Coding (SCC), Proceedings of 2013 9th
International ITG Conference on , vol., no., pp.1,6, 21-24 Jan. 2013

References
12. Groh, I.; Staudinger, E.; Sand, S., "Low Complexity High Resolution Maximum Likelihood Channel Estimation in
Spread Spectrum Navigation Systems," Vehicular Technology Conference (VTC Fall), 2011 IEEE , vol., no., pp.1,5,
5-8 Sept. 2011
13. Wei Wang; Jost, T.; Dammann, A., "Estimation and Modelling of NLoS Time-Variant Multipath for Localization
Channel Model in Mobile Radios," Global Telecommunications Conference (GLOBECOM 2010), 2010 IEEE , vol.,
no., pp.1,6, 6-10 Dec. 2010
14. Nasir, A.A.; Mehrpouyan, H.; Blostein, S.D.; Durrani, S.; Kennedy, R.A., "Timing and Carrier Synchronization With
Channel Estimation in Multi-Relay Cooperative Networks," Signal Processing, IEEE Transactions on , vol.60, no.2,
pp.793,811, Feb. 2012
15. Tsang-Yi Wang; Jyun-Wei Pu; Chih-Peng Li, "Joint Detection and Estimation for Cooperative Communications in
Cluster-Based Networks," Communications, 2009. ICC '09. IEEE International Conference on , vol., no., pp.1,5, 14-
18 June 2009
16. Xie, Yongzhe; Georghiades, C.N., "Two EM-type channel estimation algorithms for OFDM with transmitter diversity,"
Acoustics, Speech, and Signal Processing (ICASSP), 2002 IEEE International Conference on , vol.3, no., pp.III-
2541,III-2544, 13-17 May 2002

EM Presentation 2013

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

EM Presentation 2013

Caricato da

Copyright:

Formati disponibili

Expectation Maximization

TLT-5906 Advanced Course in Digital Transmission

Jukka Talvitie, M.Sc. (eng)

M.Sc. Jukka Talvitie 5.12.2013

M.Sc. Jukka Talvitie 5.12.2013

q Consider classical line fitting problem:

M.Sc. Jukka Talvitie 5.12.2013

q We use LS (Least Squares) to find the best fit:

M.Sc. Jukka Talvitie 5.12.2013

q LS would be the Best Linear Unbiased Estimator, if the noise

M.Sc. Jukka Talvitie 5.12.2013

q Intuitive (heuristic) solution:

– No guarantee for the performance in this 1.8

case (e.g. compared to maximum likelihood 1.7

q The EM algorithm provides the ML solution for

M.Sc. Jukka Talvitie 5.12.2013

x is observed only by means of y(x).

M.Sc. Jukka Talvitie 5.12.2013

qˆk +1 = arg max Q (q | qˆk )

q The likelihood of the parameter is increased at every iteration

qˆML = arg max( p( y | q ))

M.Sc. Jukka Talvitie 5.12.2013

Q(q , qˆk ) = E éëln p( s, n | q ) | y, qˆk ùû

M.Sc. Jukka Talvitie 5.12.2013

qˆk +1 = E éë S 2 | Y , qˆk ùû = E 2 éë S | Y , qˆk ùû + var éë S | Y , qˆk ùû

M.Sc. Jukka Talvitie 5.12.2013

q Next we consider a coin toss example:

M.Sc. Jukka Talvitie 5.12.2013

tossed in each set, we can w i th è5ø

calculate the ML probabilities for q A

q If we don’t know which of the coins ?

estimates cannot be calculated 0.80 x 0.20 x ≈7.2H, 0.8T ≈1.8H, 0.2T

directly 0.73 x 0.27 x ≈5.9H, 1.5T ≈2.1H, 0.5T

→ EM algorithm 0.35 x 0.65 x ≈1.4H, 2.1T ≈2.6H, 3.9T

0.65 x 0.35 x ≈4.5H, 1.9T ≈2.5H, 1.1T

M.Sc. Jukka Talvitie 5.12.2013

q Although many examples in the literature are showing excellent

M.Sc. Jukka Talvitie 5.12.2013

M.Sc. Jukka Talvitie 5.12.2013

M.Sc. Jukka Talvitie 5.12.2013

Potrebbero piacerti anche