Sei sulla pagina 1di 2

Statistics 153

The EM algorithm for a single normal sample

Instructor: Prof. A.L. Yuille

The implementation of the single sample EM algorithm, Example 8.1 in the book in R is
em.norm <- function(Y){
Yobs <- Y[!is.na(Y)]
Ymis <- Y[is.na(Y)]
n <- length(c(Yobs, Ymis))
r <- length(Yobs)
# initial values
mut <- mean(Yobs) # (*)
sit <- var(Yobs)*(r-1)/r # (**)
# Define log-likelihood function
ll <- function(y, mu, sigma2, n){
-.5*n*log(2*pi*sigma2)-.5*sum((y-mu)^2)/sigma2
}
# Compute the log-likelihood for the initial values, and ignoring the missing data mechanism
lltm1 <- ll(Yobs, mut, sit, n)
repeat{
# E-step
EY <- sum(Yobs) + (n-r)*mut
EY2 <- sum(Yobs^2) + (n-r)*(mut^2 + sit)
# M-step
mut1 <- EY / n
sit1 <- EY2 / n - mut1^2
# Update parameter values
mut <- mut1
sit <- sit1
# compute log-likelihood using current estimates, and igoring the missing data mechanism
llt <- ll(Yobs, mut, sit, n)
# Print current parameter values and likelihood
cat(mut, sit, llt, "\n")
# Stop if converged
if ( abs(lltm1 - llt) < 0.001) break
lltm1 <- llt
}
# fill in missing values with new mu.
return(mut,sit)
}
To run it we draw a sample, and assign some missing values:
> x <- rnorm(20,5)
> x[16:20] <- NA
> x
[1] 6.692930 6.580815 7.733349 6.603173 5.714990 4.017255 3.383734 3.947348
[9] 3.883307 5.928382 5.121450 5.276446 4.601717 5.480934 6.740819 NA
[17] NA NA NA NA
> em.norm(x)
5.44711 1.548372 -30.25081
$mut
[1] 5.44711

$sit
[1] 1.548372

1
There is immediate convergence because the explicit maximum likelihood estimates for the whole data, are those based
on the complete data only ignoring the missing data mechanism. Those we used as initial estimates.
When we use other values for the initial estimates, actual iteration has to take place. To see this we change the two lines
in the algorithm labeled (*) and (**) by mut <- 1 and sit <- 0.1 respectively. We get the following output for the
updated iteration scheme.
> em.norm(x)
4.335333 4.894426 -38.52646
5.169166 2.616645 -32.65717
5.377624 1.829925 -30.78735
5.429739 1.619665 -30.37223
5.442767 1.566252 -30.28010
5.446024 1.552845 -30.25806
5.446839 1.549490 -30.25262
5.447042 1.548651 -30.25126
5.447093 1.548442 -30.25092
$mut
[1] 5.447093

$sit
[1] 1.548442
Check both outputs with exact findings (top of page 169).
> mean(x,na.rm=T)
[1] 5.44711
> mean(x^2,na.rm=T) - mean(x,na.rm=T)^2 # variance
[1] 1.548372
> var(x, na.rm=T)* (15-1)/15 # variance another way
[1] 1.548372
> crossprod(x[!is.na(x)]-mean(x, na.rm=T))[1,1]/15 # yet another way
[1] 1.548372

Potrebbero piacerti anche