Sei sulla pagina 1di 6

EXPECTATION-MAXIMIZATION TECHNIQUE AND SPATIAL -

ADAPTATION APPLIED TO PEL-RECURSIVE MOTION ESTIMATION


Vania Vieira Estrela
Universidade Estadual do Norte Fluminense, LCMAT-CCT, Grupo de Computação Científica
Av. Alberto Lamego 2000, Campos dos Goytacazes, RJ, BRAZIL, CEP 28013-600
E-mail: vaniave@uenf.br

M. H. da Silva Bassani
Departamento de Engenharia Mecânica, DEPES,CEFET-RJ
Av. Maracanã 229, Rio de Janeiro, RJ, BRAZIL, CEP 20271-110
E-mail: marcos@cefet-rj.br

does not continuously depend on the data due to the fact that
ABSTRACT motion estimation is highly sensitive to the presence of
Pel-recursive motion estimation is a well-established approach. observation noise in video images.
However, in the presence of noise, it becomes an ill-posed In this article, we plan to use the MAP estimate to
problem that requires regularization. In this paper, motion find u, that is, the update of the motion, from our observation
vectors are estimated in an iterative fashion by means of the model by means of the Espectation-Maximization technique.
Expectation-Maximization (EM) algorithm and a Gaussian data The main advantage of the EM method is that the final
model. Our proposed algorithm also utilizes the local image algorithm deals with closed-form expressions and does not
properties of the scene to improve the motion vector estimates require the use of optimization techniques.
following a spatially adaptive approach. Numerical experiments We organized this work as follows. Section 2 provides
are presented that demonstrate the merits of our method. some necessary background on the pel-recursive motion
estimation problem. Section 3 introduces our spatially adaptive
approach. Section 4 describes the EM framework. Section 5
Keywords: expectation-maximization; EM; computer vision;
defines the metric used to evaluate our results. Section 6
motion estimation; pel-recursive, maximum likelihood.
describes some implementation aspects and the experiments
used to access the performance of our proposed algorithm.
Finally, Section 7 has some conclusions.
1. INTRODUCTION

Motion estimation is very important in multimedia video 2. PEL-RECURSIVE MOTION ESTIMATION


processing applications. For example, in video coding, the
estimated motion is used to reduce the transmission bandwidth.
The evolution of an image sequence motion field can also help The displacement of each picture element in each frame forms
other image processing tasks in multimedia applications such as the displacement vector field (DVF) and its estimation can be
analysis, recognition, tracking, restoration, collision avoidance done using at least two successive frames. The DVF is the 2D
and segmentation of objects [9]. motion resulting from the apparent motion of the image
In coding applications, a block-based approach [12] is brightness (OF). A vector is assigned to each point in the image.
often used for interpolation of lost information between key A pixel belongs to a moving area if its intensity has
frames. The fixed rectangular partitioning of the image used by changed between consecutive frames. Hence, our goal is to find
some block-based approaches often separates visually the corresponding intensity value Ik(r) of the k-th frame at
meaningful image features. If the components of an important location r = [x, y]T, and d(r) = [dx, dy]T the corresponding (true)
feature are assigned different motion vectors, then the displacement vector (DV) at the working point r in the current
interpolated image will suffer from annoying artifacts. frame. Pel-recursive algorithms minimize the DFD function in a
Pel-recursive schemes [2, 4, 5, 9] can theoretically overcome small area containing the working point assuming constant
some of the limitations associated with blocks by assigning a image intensity along the motion trajectory. The DFD is
unique motion vector to each pixel. Intermediate frames are defined by
then constructed by resampling the image at locations ∆(r; d(r)) = Ik(r)- Ik-1 (r-d(r)) (1)
determined by linear interpolation of the motion vectors. and the perfect registration of frames will result in Ik(r)=Ik-1(r-
The pel-recursive approach can also manage motion d(r)). The DFD represents the error due to the nonlinear
with sub-pixel accuracy. However, its original formulation was temporal prediction of the intensity field through the DV. The
deterministic. The update of the motion estimate was based on relationship between the DVF and the intensity field is
the minimization of the displaced frame difference (DFD) at a nonlinear. An estimate of d(r), is obtained by directly
pixel. In the absence of additional assumptions about the pixel minimizing ∆(r,d(r)) or by determining a linear relationship
motion, this estimation problem becomes “ill-posed” because of between these two variables through some model. This is
the following problems: a) occlusion; b) the solution to the 2D accomplished by using the Taylor series expansion of Ik-1(r-
motion estimation problem is not unique; and c) the solution
d(r)) about the location (r-di(r)), where di(r) represents a
prediction of d(r) in i-th step. This results in
∆(r,r-di(r))=-uT∇Ik-1(r-di(r)) + e(r,d(r)), (2) 4. THE EM APPROACH

where the displacement update vector u=[ux, uy]T = d(r) – di(r), 4.1. The Ordinary ML Estimate and Its Limitations
e(r, d(r)) represents the error resulting from the truncation of The MAP estimate of u is given by [11]
the higher order terms (linearization error) and ∇=[∂/∂x, ∂/∂y]T uˆ MAP = ΛuG T ( G ΛuG T + Λn )−1 z . (4)
represents the spatial gradient operator. Applying (2) to all The computation of ûMAP requires knowledge of the
points in a neighborhood R gives
covariance matrices of the update vector (Λu) and the
z = Gu+ n, (3) noise/linearization error term (Λn), respectively.
where the temporal gradients ∆(r, r-di(r)) have been stacked to The estimate in Eq. (4) was derived assuming that u and n have
form the N×1 observation vector z containing DFD information zero mean, and are uncorrelated. The MAP for the linear
on all the pixels in a neighborhood R, the N×2 matrix G is observation model in Eq. (3) is also the maximum a posteriori
obtained by stacking the spatial gradient operators at each (MAP) estimate, assuming a Gaussian prior on u and Gaussian
observation, and the error terms have formed the N×1 noise noise n (for more details, see [11]).
vector n which is assumed Gaussian with n~N(0, σn2I). Each In this work, we resort to the maximum likelihood
row of G has entries [gxi, gyi]T, with i = 1, …, N. The spatial (ML) estimation of the second-order statistics Λu and Λn of
gradients of Ik-1 are calculated through a bilinear interpolation the model. The calculation of the ML estimates of Λu and Λn
scheme [2]. is done iteratively by means of the Expectation-Maximization
(EM) algorithm formalized by Dempster et al [3] which has
3. SPATIAL ADAPTATION been used in a variety of applications [7, 8, 10]. The EM
algorithm was previously used for motion estimation in [6].
Aiming to improve the estimates given by the pel-recursive However, this work deals with estimate parameters of affine
algorithm, we introduced an adaptive scheme for determining models and it uses a block-matching framework.
the optimal shape of the neighborhood of pixels with the same For the model described by Eq. (3), let us assume that
DV used to generate the overdetermined system of equations the update vector u and the noise n are normally-distributed
given by (3). More specifically, the masks in Fig. 1 show the with mean zero and covariance matrices equal to Λu and Λn,
geometries of the neighborhoods used. respectively. If we consider u and n as being uncorrelated, then
Errors can be caused by the basic underlying the pdf of z is
1
assumption of uniform motion inside R (the smoothness −  1 
f z ( z ) = 2π ( G ΛuG H + Λn ) 2 × exp − z H ( G ΛuG H + Λn )−1 z  .
constraint), by not grouping pixels adequately, and by the way  2 
gradient vectors are estimated, among other things. Since it is Let us take Φ = {Λu, Λn} as the parameter set to be
known that in a noiseless image not containing pixels with estimated. The pdf of z can be considered a continuous and
constant intensity, most errors, when estimating motion, occur differentiable function of Φ. This dependency can be better
close to motion boundaries, we propose a hypothesis testing captured by the notation fz(z; Φ). Furthermore, we can assume
(HT) approach to determine the best neighborhood shape for a that the additive noise is white, with covariance matrix Λn =
given pixel. We pick up the neighborhood from the finite set of σn2I, where I is an N × N identity matrix.
templates shown in Fig. 1, according to the smallest DFD The ML estimation of Φ is the ΦML that maximizes
criterion, in an attempt to adapt the model to local features
the logarithm of the likelihood function of fz(z;Φ), which is
associated to motion boundaries.
given by

Φ ML = arg{ max f z ( z ;Φ )} = arg{ maxlog f z ( z ;Φ )}. (5)


Φ Φ

Combining Eqs. (4) and (5), we conclude that the


maximization of the log-likelihood function is equivalent to
minimizing the function Lo(Φ) where

Lo ( Φ ) = log G ΛuG H + Λn + z H ( G ΛuG H + Λn )−1 z.

Lo(Φ) is a nonlinear and non-unimodal function with respect to


Φ. Analytical solutions for the previous equation are out of
question. Thus, we have to resort to optimization techniques.
Nevertheless, since Lo(Φ) is not unimodal, convergence and
local extrema of iterative optimization methods are serious
problems. The EM algorithm arises as an alternative for
X Current pixel O Neighboring pixel
estimating Φ, since its convergence to a local maximum is
guaranteed.
Figure 1: Neighborhood geometries.
The EM algorithm is a general numerical technique expectation at iteration p, given the observation z. Since our
which can be used to determine the maximum likelihood observation model is linear and Gaussian statistics are assumed,
estimate (MLE) of a set of parameters. It can be employed for we have
identification of model and distribution parameters 
u MMSE = 
u LMMSE = 
u MAP .
simultaneously. The parameters can be estimated iteratively
even in situations where some variables cannot be observed.
This estimate is obtained as a byproduct of the estimation of Φ.
4.2. Problem Formulation According to the EM Framework The statistical assumptions about u and n result in Φ = {σ12,
We assume that the update vector u normally-distributed with σ22, σn2}. The two steps of the resulting EM algorithm are
mean zero and covariance matrix equal to Λu, and n ~ N(0,
E-Step:
σ n2 I ) is the additive Gaussian noise.
If we choose x = [ u n] as the complete data , where
T
F( Φ ;Φ ( p ) ) = log( σ 12 ) + log( σ 22 )
( N + 2 )× 1
x∈\ , then x will be normally distributed with diagonal ( p)
a11 + c12( p ) ( p)
a22 + c22( p )
+ m[ log( σ n2 )] + +
covariance matrix Λx equal to σ 12 σ 22
Λ 0   Λu 0  1
Λx =  u 2 
= , + ( p)
[b11 + " + bmm
( p)
+ e12( p ) + " + em2( p ) ],
 0 σ nI  0 Λn  σ n2
with its corresponding inverse where
 Λ −1 0  a ( p ) a ( p ) 
Λx−1 =  u −1 
. A ( p ) =  11( p ) 12( p )  = Λ u( p ) − Λ u( p )G H (GΛ u( p )G H + Λ (np ) ) −1 GΛ u( p )  ,
 0 Λn  a21 a22 
Eq. (3) can be re-written as
B( p ) = Λn( p ) − Λn( p ) (G Λu( p )G H + Λn( p ) ) Λn( p ) 
−1

 u  
z = [G ; I ]   = Hx .
 n b11( p) ( p)
b12 ( p)
" b1m 
 ( p) ( p) ( p )
The foundation of the EM algorithm is the b b22 " b2m 
=  21 ,
maximization of the expectation of log{fx(x;Φ)} given the  # # % # 
incomplete observed data z and the current estimate of the  ( p) ( p) ( p)

parameter set Φ. fx(x;Φ) is the pdf of the complete data and it  m1
b bm2 " bmm 
is given by  c( p ) 
c( p ) = µ u|z
( p)
=  1( p )  =  Λu( p )G H ( G Λu( p )G H + Λn( p ) )−1 z  , (7)

1
 1   c2 
f x ( x ;Φ ) = 2πΛx 2 exp − x H Λx−1 x  . (6)
 2   e1( p ) 
Taking the logarithm of both sides of Eq. (11) leads to  
and e( p ) = µ n|z
( p)
=  #  =  Λn( p ) ( G Λu( p )G H + Λn( p ) )−1 z  .
1 1
log {f x ( x ;Φ )} = − log 2πΛx − x H Λx−1 x  em( p ) 
2 2  
1
= K − [Λx + x H Λ ]. M-Step:
2
where Φ = Λx and K is a constant independent of Φ. The E-step
requires the computation of the expectation In order to obtain this step, we need to find the minimum of
F(Φ;Φ(p)). Differentiating F(Φ;Φ(p)) with respect to σn2 and
making it equal to zero yields
Q( Φ ;Φ ( p ) ) = E[log{ f x ( x ;Φ )} | z ;Φ ( p ) ] ,
1
m
{ 2
σ n2( p + 1 ) = Tr  B( p )  + e( p ) . (8) }
where Φ(p)=[Λu(p);Λn(p)] is the estimate of the parameter set
Applying similar procedure for σ12 and σ22, gives
Φ={Λu;Λn} at the p-th iteration. The M-step
maximizes Q( Φ ;Φ ( p ) ) :
σ 12( p + 1 ) = a11
( p)
+ c12( p ) , and (9)
Φ ( p +1 )
= arg{ max Q( Φ ;Φ ( p)
)} . σ =a +c2( p + 1 )
2 . ( p)
22
2( p )
2 (10)
Φ
Now, we are ready to state the resulting EM algorithm using
The E-step can be re-written in terms of the parameter set Φ by multiple masks.
means of the relationships developed in terms of the complete
data x and the pdf f(x|z;Φ(p)) as
4.3. The EM-Based Motion Estimation Descrptive
F( Φ ;Φ ( p ) ) = log Λu + log Λn + tr( Λu−1Λu|z
( p)
) Algorithm
. For each pixel in the current frame k, located at r, do the
+ tr( Λn−1Λn|z
( p)
) + µ u|z
( p )H
Λu−1µu|z
( p)
+ µ n|z
( p )H
Λn−1µ n|z
( p)
following:
1) Initialize the system: d 0 ( r ) , σ n2( 0 ) , σ 12( 0 ) , and σ 22( 0 ) ,
Our goal is to estimate the update vector u. The MAP/MMSE
( p)
m ← 0 (m=mask counter), and p ← 0 (p= iteration
estimate of u is µu|z which corresponds to its conditional
counter).
2) If DFD < T , then stop. T= threshold for DFD .
3) Calculate Gi and zi for the current mask and current initial 6. EXPERIMENTS
estimate.
4) Calculate the current update vector by means of Eq. (7) , This section presents results illustrating the effectiveness of the
(byproduct of the E-Step). EM algorithm and compare it to the Wiener filter described by
5) Perform the M-Step and update the parameter estimates 
u Wiener = 
u LMMSE = (G T G + µ I ) −1 G T z ,
using Eqs. (8), (9), and (10).
6) Calculate the new displacement vector: with µ=50 and constant for the entire frame [1]. The algorithms
d p +1( r ) = d p ( r ) + u p . (11) were tested using two real video sequences: the "Foreman" and
7) For the current mask m: the "Mother and Daughter" (MD). The sequences are 144 x
If Φ p + 1 − Φ p ≤ ξ (convergence test for the EM 176, 8-bit (QCIF). The algorithm called “EM” uses one
mask , while algorithm “EMm” employs multiple masks.
algorithm), d p + 1 ( r ) − d p ( r ) ≤ ε and DFD < T , then stop.
Otherwise, if p < ( I − 1 ) , where I is the maximum number
Mother and Daughter Sequence - Noiseless Case
of iterations allowed, then go to step 3 with p ← p + 1 . 10.5
If p=I-1, try another neighborhood geometry
(m←m+1),and reset variables: p ← 0 , d 0 ( r ) , σ n2( 0 ) , σ 12( 0 ) , 10

Improvement in Motion Compensation (dB)


and σ 2( 0 )
2 . Go to step 2. 9.5
If all masks where used and no displacement vector was
found, then set d p + 1 ( r ) = 0 . 9

8.5
o---o EMm
5. METRIC
+---+ EM
8
*---* Wiener
This work assesses the motion field quality through the use of
the fowlling metric [2, 4, 5]: 7.5

7
5.1 Improvement in Motion Compensation 31 32 33 34 35 36 37 38 39 40
Frame number, k
The IMC( dB ) between two consecutive frames is given by
 [ I k ( r ) − I k −1( r )] 

2
(a)
 
IMC k ( dB ) = 10 log10  r ∈S
2
, where S is
 ∑  I k ( r ) − I k −1 ( r − d ( r ))  
 r ∈S 
Mother and Daughter Sequence - SNR=20 dB
the frame being currently analyzed. It shows the ratio in 9.5
2
decibel (dB) between the mean-squared frame difference ( FD ) 9
defined by
Improvement in Motion Compensation (dB)

2
∑ [ I k ( r ) − I k −1( r )] 2 8.5

FD = r ∈S
RC 8
2 o---o EMm
and the DFD between frames k and (k-1) . +---+ EM
7.5
As far as the use of the this metric goes, we chose to *---* Wiener
apply it to a sequence of K frames, resulting in the following 7
equation for the average improvement in motion compensation:
6.5
 K

∑∑ [ I k ( r ) − I k −1( r )]
2
  6
IMC( dB ) = 10 log10  K k = 2 r ∈S 
31 32 33 34 35 36
Frame number, k
37 38 39 40

 ∑∑  I k ( r ) − I k −1 ( r − d ( r ))  2 
 k = 2 r∈S   
(b)
When it comes to motion estimation, we seek
Figure2: IMC( dB ) for the noiseless (a) and noisy (b)
algorithms that have high values of IMC (dB) . If we could
detect motion without any error, then the denominator of the
cases for the “MD” sequence.
previous expression would be zero (perfect registration of
motion) and we would have IMC( dB ) =∞.
Foreman Sequence - Noiseless Case
12.5

12

Improvement in Motion Compensation (dB)


11.5

11

10.5 o---o EMm


+---+ EM
*---* Wiener
10

9.5

8.5
11 12 13 14 15 16 17 18 19 20
Frame number, k

(a) (a)

Foreman Sequence - SNR=20dB


11

10.5

Improvement in Motion Compensation (dB)


10

9.5

o---o EMm
9 +---+ EM
*---* Wiener

8.5

7.5
11 12 13 14 15 16 17 18 19 20
Frame number, k

(b) (b)

Figure 3: Motion-compensated errors for frame 32 of the Figure 4: IMC ( dB) for frames 11-20 of the noiseless (a)
“Mother and Daughter” sequence with SNR=20dB: the Wiener and noisy cases with SNR=20dB (b) for the “Foreman”
(a) and the EMA (b) algorithms. sequence.
has great potential for applications such as video coding
and image segmentation.

8. REFERENCES

[1] Biemond, J., Looijenga, L., Boekee, D.E., and Plompen, R.


H. J. M., "A Pel-Recursive Wiener-Based Displacement
Estimation Algorithm,'' Signal Processing, 13, pp. 399-412,
December 1987.
[2] Brailean, J.C., and Katsaggelos, A. K.,
“Simultaneous Recursive Displacement Estimation and
Restoration of Noisy-Blurred Image Sequences”, IEEE Trans.
Im. Proc., Vol. 4, No. 9, pp. 1236-1268, September 1995.
[3] Dempster, A.P., Laird, N.M., and Rubbin, D.B.,
“Maximum Likelihood From Incomplete Data via the EM
Algorithm”, Ann. Roy. Statist. Soc., Vol. 39, pp. 1-38, Dec.
1977.
(a) [4] Estrela, V. V., Rivera, L.A., Beggio, P.C., and Lopes, R.T.,
“Regularized Pel-Recursive Motion Estimation Using
Generalized Cross-Validation and Spatial Adaptation”, Proc.
of SIBGRAPI2003 XVI Brazilian Symp. Comp. Grap. and
Image Processing, pp. 331-338, Oct. 2003.
[5] Estrela, Vania Vieira, Rivera, L.A. and Bassani, M.H.S.,
“Pel-Recursive Motion Estimation Using the Expectation-
Maximation Technique and Spatial Adaptation”, Journal Of
WSCG, Pilzen, Czech Republic, Vol. I, p. 219-226, Feb.
2004.
[6] Fan, C.M., Namazi, N.M., and Penafiel, P.B., “A New
Image Motion Estimation Algorithm Based on the EM
Technique”, IEEE Trans. on P.A.M.I., Vol. 18, No. 3, pp.
348-352, 1996.
[7] Galatsanos, N.P., and Katsaggelos, A.K., "Methods for
Choosing the Regularization Parameter and Estimating the
Noise Variance in Image Restoration and Their Relation,''
IEEE Trans. Image Processing, Vol. 1, No. 3, pp. 322-336,
July 1992.
[8] Galatsanos, N.P., Mesarovic, V.Z., and Molina, R., “Two
(b) Bayesian Image Restoration Algorithms from Partially
Figure 5: Motion-compensated errors for frame 16 of the Known Blur”, Proc. of IEEE International Conf on Image
“Foreman” sequence with SNR=20dB: the Wiener (a) and the Processing, vol. 2, 93-97, Chicago, Illinois (USA), 1998.
EMA (b) algorithms. [9] Jain, A.K., Fundamentals of Digital Image Processing,
Prentice-Hall, New Jersey, 1989.
[10] Katsaggelos, A.K., Image Restoration, Springer- Verlag,
7. CONCLUSIONS Berlin, Heidelberg, 1991.
[11] Kay, S., Fundamentals of Statistical Signal Processing:
Estimation Theory, Prentice Hall, 1993.
For all the sequences, the EM algorithm performed better than [12] Tekalp, M., Digital Video Processing, Prentice Hal 1995.
the Wiener filter for both the one mask and multi-mask cases,
regardless of the presence of noise.
The EM algorithm showed some sensitivity to the
choice of initial estimates. We used more than one initial
parameter set Φ 0 = [σ n2( 0 ) ,σ 12( 0 ) ,σ 22( 0 ) ]T to improve the rate of
convergence , but even with this extra feature, the resulting
algorithms using one mask and multiple masks where faster
than the corresponding Wiener filter counterparts.
For the EM method, we have a simple algorithm that
is guaranteed to converge and it does not equire numerical
optimization. Given that there are multiple iterations at every
pixel location, the speed advantage gained by means of the EM
algorithm is considerable. The authors think this framework

Potrebbero piacerti anche