Sei sulla pagina 1di 16

Image and Vision Computing 22 (2004) 291–306

www.elsevier.com/locate/imavis

Fast stitching algorithm for moving object detection


and mosaic construction
Jun-Wei Hsieh*
Department of Electrical Engineering, Yuan Ze University, Taoyuan, Taiwan, ROC
Received 20 March 2003; received in revised form 19 September 2003; accepted 30 September 2003

Abstract
This paper proposes a novel edge-based stitching method to detect moving objects and construct mosaics from images. The method is a
coarse-to-fine scheme which first estimates a good initialization of camera parameters with two complementary methods and then refines the
solution through an optimization process. The two complementary methods are the edge alignment and correspondence-based approaches,
respectively. The edge alignment method estimates desired image translations by checking the consistencies of edge positions between
images. This method has better capabilities to overcome larger displacements and lighting variations between images. The correspondence-
based approach estimates desired parameters from a set of correspondences by using a new feature extraction scheme and a new
correspondence building method. The method can solve more general camera motions than the edge alignment method. Since these two
methods are complementary to each other, the desired initial estimate can be obtained more robustly. After that, a Monte-Carlo style method
is then proposed for integrating these two methods together. In this approach, a grid partition scheme is proposed to increase the accuracy of
each try for finding the correct parameters. After that, an optimization process is then applied to refine the above initial parameters. Different
from other optimization methods minimizing errors on the whole images, the proposed scheme minimizes errors only on positions of features
points. Since the found initialization is very close to the exact solution and only errors on feature positions are considered, the optimization
process can be achieved very quickly. Experimental results are provided to verify the superiority of the proposed method.
q 2004 Elsevier B.V. All rights reserved.
Keywords: Image registration; Image-based rendering; Mosaics; Moving object detection; Video retrieval

1. Introduction Then, the parameters of this model can be recovered from


pair of images by two common methods, i.e. the correlation-
Image stitching is the process of recovering the existing based approach and the optimization-based one. For the first
camera motions between images and then compositing them approach, the measure ‘correlation’ can be calculated in
together. This technique has been successfully applied to frequency domain or spatial domain. For example, Kuglin
many different applications like video compression [1], and Hines [4] presented a phase-correlation method to
video indexing [2,3], object tracking [9], or creation of estimate the displacement between two adjacent images in
virtual environments [5 – 8,12]. For example, Shum and frequency domain. For the approaches in spatial domain,
Szeliski [7,8] proposed methods to stitch a set of images Sawhney and Ayer [1] proposed a feature matching
together to form a panorama. In addition, Irani and Anandan approach to estimate the dominant and multiple camera
[2] used this technique to represent and index different video motions. In addition, Zoghlami et al. [11] proposed a corner-
contents. Moreover, Jin et al. [9] used this technique to based approach to build a set of correspondences for
compensate unwanted camera motions for extracting computing the transformation parameters from pair of
desired objects from video sequences. For most methods images. However, the establishment of good correspon-
in this field, an affine camera model is used to approximate dences is a challenging work when images have nonlinear
the possible motions between two consecutive frames. intensity changes [14]. In order to avoid this problem, some
researchers [6,13] treated the stitching problem as a global
* Tel.: þ886-3-463-8800; fax: þ 886-3-463-9355. optimization problem. For examples, Szeliski [6] proposed
E-mail address: shieh@saturn.yzu.edu.tw (J.-W. Hsieh). a nonlinear minimization algorithm for automatically
0262-8856/$ - see front matter q 2004 Elsevier B.V. All rights reserved.
doi:10.1016/j.imavis.2003.09.018
292 J.-W. Hsieh / Image and Vision Computing 22 (2004) 291–306

registering images by minimizing the discrepancy in 2. Camera motion model


intensities between images. In addition, Davis [13] proposed
an optimization scheme to obtain a least-square solution by Assume that input images are captured by a video camera
globally optimizing all pair-wise registrations. In compari- from a static scene. Then, the relationship between two
son with the correlation-based method, the global optimiz- adjacent images can be described by a planar perspective
ation approach performs more robustly but will be trapped motion model [6 – 12] as follows:
on a local minimum if the starting point is not properly m0 x þ m1 y þ m2 m x þ m4 y þ m5
initialized. x0 ¼ and y0 ¼ 3 ; ð1Þ
m6 x þ m7 y þ 1 m6 x þ m7 y þ 1
In this paper, we present an edge-based stitching
technique to detect moving objects and construct mosaics where ðx; yÞ is the coordinate of a pixel in the current frame
from consecutive images. In general, the transformations I0 ; ðx0 ; y0 Þ the coordinate of its corresponding point in the
between consecutive images can be described by a planar next frame I1 ; and M ¼ ðm0 ; m1 ; …; m7 Þ the parameters
perspective motion model. Since the transformation is associated with the focal length, rotation angle, and scaling
nonlinear, the paper uses a coarse-to-fine approach to of the camera. Clearly, Eq. (1) is a nonlinear transformation.
robustly and accurately recover the desired model par- In the past, the parameters of this model were obtained by
ameters. Firstly, at the coarse stage, two complementary minimizing the error function EðMÞ as follows [6]:
methods, i.e. the edge alignment and the correspondence- X X 2
EðMÞ ¼ ½I1 ðx0i ; y0i Þ 2 I0 ðxi ; yi Þ2 ¼ ei ; ð2Þ
based approaches, are proposed to get respective initial i i
estimates from images. Then, at the fine stage, the found
initial estimate can be further refined through an optimiz- where ei ¼ I1 ðx0i ; y0i Þ
2 I0 ðxi ; yi Þ; the intensity difference of a
ation process. The edge alignment method finds possible pixel between I0 and I1 : Let
image translations by checking the consistencies of edge X ›e i
bk ¼ ei
positions between different images. It is simple and efficient ›m k
i
without involving any optimization process or building any
correspondences. In addition, the method has better and
capabilities to overcome large displacements and lighting X ›e i ›e i
variations between images. On the other hand, the akn ¼ :
i ›m k ›m n
correspondence-based method obtains desired model par-
ameters from a set of correspondences by using a new The M can be obtained by the iterative form:
feature extraction and a new correspondence building
M T ˆ M T þ DM T ; ð3Þ
method. Especially, when building correspondences, a
new measure is defined to measure the goodness of each where DM T ¼ ðA þ lIÞ21 B; T means the transpose matrix,
match such that all false correspondences between features A ¼ ½akn ; B ¼ ½bk ; and l is a coefficient obtained by the
can be eliminated as well as possible. Compared with the Levenber –Marquardt method [15]. The method works well
edge alignment method, the correspondence-based if the initial value to the correct M is close enough.
approach can solve more general camera motion model However, it suffers from low convergence and gets trapped
but fails to work when images have large lighting changes. in local minimum if the initialization is not proper,
Therefore, due to the complementary property of the two especially when images have large displacements. It is
methods, we can obtain the desired initial estimate more noticed that Eq. (3) tries to find desired solutions by
robustly. After that, a Monte-Carlo style method with gird minimizing intensity errors of all pixels between images.
partition is then proposed to integrate these methods When the number of iterations increases, the calculation of
together. The grid partition scheme can much enhance the intensity errors will become very time-consuming. There-
accuracy of each try for deriving the correct parameters. fore, in what follows, we will propose a fast edge-based
Then, the found parameters are further refined through an algorithm for tacking all the above problems.
optimization process. Sine the minimization is only applied
to the positions of matching pairs, the optimization process
can be performed very efficiently. From experimental 3. Fast algorithm for camera compensation and mosaic
results, the proposed method indeed achieves great construction
improvements in terms of stitching accuracy, robustness,
and stability. As described before, due to the nonlinearity of Eq. (1),
The rest of the paper is organized as follows. In Section 2, the best method to estimate the affine model is through an
we will present an affine model to approximate the motions optimization approach [6 –8]. In this paper, a coarse-to-fine
of a video camera. Then, details of the proposed method for approach is proposed to guide the optimization process. At
recovering the parameters of this model are described in the coarse stage, an edge-based approach is proposed to find
Section 3. Section 4 reports the experimental results. a good initial estimate which will be further refined through
Finally, conclusions will be presented in Section 5. an optimization process. The initial estimate is got from two
J.-W. Hsieh / Image and Vision Computing 22 (2004) 291–306 293

into an optimization process for deriving the correct model


parameters. In this section, the edge alignment method is
first proposed to estimate desired model translations from
pair of images. Let gx ðpÞ denote the gradient of a pixel p in
the x direction of an image I; i.e.
gx ðpði; jÞÞ ¼ lIðpði þ 1; jÞÞ 2 Iðpði 2 1; jÞÞl;

where IðpÞ is the intensity of p: In addition, let Sg ðiÞ be the


sum of gx ðpÞ obtained by accumulating gx ðpÞ for all pixels
along the ith column, i.e.
1 X
Sg ðiÞ ¼ lIðpði þ 1; jÞÞ 2 Iðpði 2 1; jÞÞl;
H j

where H is the height of I: If Sg ðiÞ is larger than a threshold,


i.e. 15, the ith column is considered to have a vertical edge.
After checking all pixels of input images column by column,
a set of positions of vertical edges can be found.
Assume Ia and Ib are two images prepared to be stitched
and shown in Fig. 2(a) and (b), respectively. Through the
Fig. 1. Flowchart of the proposed method.
above vertical edge detector, the positions of vertical edges
in Ia and Ib can be obtained as Pva ¼
complementary methods, i.e. the edge alignment and the
ð100; 115; 180; 200; 310; 325; 360; 390; 470Þ and Pvb ¼
correspondence-based approaches. Since the two methods
ð20; 35; 100; 120; 230; 245; 280; 310; 390Þ; respectively. If
are complementary to each other, the robustness of the
the images Ia and Ib come from the same static scene,
whole process of parameter estimation can be much
there should exist an offset dx such that Pva ðiÞ ¼ Pvb ðjÞ þ dx
enhanced. After that, a Monte-Carlo style method is then
and the corresponding relation between i and j is one-to-one.
used to integrate the above solutions together. For accuracy
Then, the offset dx is the desired translation solution
consideration, at the fine stage, the found parameters will be
between Ia and Ib in the x direction, i.e. dx ¼ 80: Based
further refined with an optimization process, which
on this idea, in what follows, a novel method will be
minimizes errors only on the coordinates of feature points.
proposed to estimate desired translation parameters from
Since the number of feature points is much smaller than the
images without building any correspondences or involving
whole image, the optimization process can be performed
any optimization processes.
extremely efficiently. The overall flowchart of the proposed
Before describing the proposed method, we shall know in
approach is described in Fig. 1. In what follows, details of
practice due to noise, some edges will be lost or undetected.
each proposed algorithm are described. For analyzing the
The lost or undetected edges will lead to that the relations
efficiency of each algorithm, details of complexity analysis
between Pva and Pvb are no longer one-to-one. For this
are also given in Section 3.6.
problem, this paper defines a distance function dv ði; kÞ to
measure the distance of a position Pva ðiÞ to the translation
3.1. Translation estimation using edge alignment
solution k as
As described in Fig. 1, for the purpose of robustness, we dv ði; kÞ ¼ min v lPva ðiÞ 2 k 2 Pvb ðjÞl; ð4Þ
1#j#Nb
propose two strategies to find different initializations fed

Fig. 2. Edge results of two images.


294 J.-W. Hsieh / Image and Vision Computing 22 (2004) 291–306

where Nbv is the number of elements in Pvb : Let Td denote where mi and si are the local mean and variance of Ii ;
a threshold and set to be 4. Given a number k; we want to respectively; ð2K þ 1Þ2 represents the area of matching
determine the number Npv of elements in Pva whose dv ði; kÞ window. When considering efficiency, the measure Dðp; qÞ
is less than Td : In addition, we denote the average value is preferred and adopted in this paper. It is well known the
of dv ði; kÞ for these Npv elements as Ekv ; which can be used computation of the sum of differences is time-consuming. In
as an index to measure the goodness of k to see whether it order to solve this problem, this paper uses a branch-and-
is a suitable translation solution. If Ekv is smaller enough bounded (or pruning) technique to speed up the calculations
and Npv is larger enough, the position k can be considered of Dðp; qÞ: First, a matrix is obtained by recording all the
as a good horizontal translation. More precisely, if Ekv # temporary values when accumulating the previous sum of
Te and Npv $ Tp ; the k is collected as an element of the set differences. The matrix is then used to check the
Sx of possible horizontal translations, where the two accumulating result when calculating the sum of differ-
thresholds Tp and Te are set to be 5 and 2, respectively. ences. If current result is larger than its corresponding
Let Wb be the width of the input image Ib : Through threshold stored in this matrix, further accumulation for
examining different k for all lkl , Wb ; the set Sx can be getting the final sum of differences is not necessary. Since
obtained. the set Sxy is small and many unnecessary accumulations
On the other hand, let Pha and Phb be the sets of horizontal have been avoided, the best solution of translations can be
edge positions in Ia and Ib ; respectively. With Pha and Phb ; we quickly obtained. In addition, since many impossible
can define a distance function dh as follows: translations have been filtered out in advance by edge
dh ði; kÞ ¼ min lPha ðiÞ 2 k 2 Phb ðjÞl; ð5Þ alignment, the proposed method has better capabilities to
1#j#Nbh overcome the problem of image lighting changes. Fig. 3
where Nbh is the number of elements in Phb : Let Hb denote shows the block diagram of this edge-based translation
the height of the input image Ib : According to dh ; with the estimation algorithm. Details of the whole algorithm are
similar method to obtain Sx ; by examining different k for summarized as follows.
all lkl , Hb ; the set Sy of possible vertical translations can
be obtained. With Sx and Sy ; the set Sxy of possible 3.1.1. Edge-based translation estimation algorithm
translations can be obtained as follows: Sxy ¼ {ðx; yÞlx [ Ia and Ib : two adjacent images prepared to be stitched.
Sx ; y [ Sy }:
Once Sxy is obtained, we want to determine the best
Step 1. Apply a vertical edge detector to find the sets Pva
translation from Sxy through a correlation technique. In this
and Pvb of vertical edge positions from Ia and Ib ;
technique, two commonly used measures are the sum of
respectively.
intensity differences and the normalized cross-correlation,
Step 2. Determine the set Sx of possible horizontal
respectively, defined as:
translations from Pva and Pvb based on dv ði; kÞ (see Eq. (4)).
X
x;y¼K Step 3. Apply a horizontal edge detector to find the sets
Dðp; qÞ ¼ lIa ðx þ px ; y þ py Þ Pha and Phb of horizontal edges from Ia and Ib ;
x;y¼2K respectively.
2 ma 2 Ib ðx þ qx ; y þ qy Þ þ mb l; ð6Þ Step 4. Determine the set Sy of possible vertical
translations from Pha and Phb based on dh ði; kÞ (see Eq.
and (5)).
X
x;y¼K Step 5. Let Sxy denote the set of possible translations, i.e.
1
Cðp; qÞ ¼ ½Ia ðx þ px ; y þ py Þ Sxy ¼ {ðx; yÞlx [ Sx ; y [ Sy }:
sa sb ð2K þ 1Þ2 x;y¼2K Step 6. Determine the best solution ðtx ; ty Þ from Sxy
through a correlation technique and a branch-and-
2 ma ½Ib ðx þ qx ; y þ qy Þ 2 mb ; ð7Þ bounded method.

Fig. 3. Flowchart of our edge alignment method.


J.-W. Hsieh / Image and Vision Computing 22 (2004) 291–306 295

When the translation ðtx ; ty Þ is found, the M of Eq. (1) can Condition 1. Pðx; yÞ must be an edge point of the
be set as: m0 ¼ 1; m1 ¼ 0; m2 ¼ 2tx ; m3 ¼ 0; m4 ¼ 1; m5 ¼ image Iðx; yÞ: This means that Pðx; yÞ is a local maxima
2ty ; m6 ¼ 0; and m7 ¼ 0: of l7I s ðx; yÞls¼2 and l7I s ðx; yÞls¼2 . a threshold, i.e. 20;
Condition 2. l7I s ðx; yÞls¼2 ¼ maxðx0 ;y0 Þ[Np 
3.2. Motion parameter estimation by feature matching {l7I ðx ; y Þls¼2 }; where Np is a neighborhood of Pðx; yÞ
s 0 0

within a 27 £ 27 window.
As described in Fig. 1, two strategies are used to find
respective initial estimates of camera parameters for 3.2.2. Correspondence establishment
further optimization process. In this section, details of In Section 3.2.1, we have described how the feature
the correspondence-based method are described. In points between Ia ðx; yÞ and Ib ðx; yÞ are derived. Now, we are
Section 3.2.1, we will propose a new method to extract a ready to find the matching pairs between Ia and Ib : Let
set of useful feature points from images based on edges. FPIa ¼ {pi ¼ ðpix ; piy Þ} and FPIb ¼ {qi ¼ ðqix ; qiy Þ} be two
Then, details of building correspondences between features sets of feature points extracted from two images Ia and Ib ;
are described in Section 3.2.2. However, due to noise, many respectively. In addition, Nfa and Nfb represent the number of
false matches will also be generated. In Section 3.2.3, a new elements in FPIa and FPIb ; respectively. The similarity
scheme is proposed to eliminate all impossible false between two feature points p and q is measured by their
matches. normalized cross-correlation. For each point pi in FPIa ; find
the maximum peak of the similarity measure as its best
3.2.1. Feature extraction matching point q in another image Ib : Then, a pair {pi , qi }
In this section, we will use several edge operators to is qualified as a matching pair if two conditions are satisfied:
extract a set of useful feature points as keys to derive desired
registration parameters. First of all, let Gs ðx; yÞ be denoted CIa I b ðpi ;qi Þ ¼ max CIa I b ðpi ;qk Þ and CIa I b ðpi ;qi Þ $ Tc ; ð8Þ
qk [FPIb
as the 2D Gaussian smoothing function as follows:
!
x2 þ y2 where Tc ¼ 0:75: The first condition enforces to find a
s
G ðx; yÞ ¼ exp 2 ; feature point qk [ FPIb such that the measure CIa ;Ib is
2 s2
maximized. As for Condition 2, it forces the value CIa ;Ib of a
where s is a standard deviation of the associated probability marching pair to be larger than a threshold (0.75 in this
distribution. Let Gsx ðx; yÞ and Gsy ðx; yÞ denote the first partial case).
derivatives of Gs in the x and y directions, respectively, i.e.
! 3.2.3. Eliminating false matches
s x x2 þ y2 In the previous section, through matching, a set of
Gx ðx; yÞ ¼ 2 2 exp 2 and Gsy ðx; yÞ
s 2s2 matching pairs has been extracted. However, if the relative
! geometries of features are considered, the matching results
y x2 þ y2 can be refined more accurately. Therefore, in this section,
¼ 2 2 exp 2 :
s 2s2 we will define a matching goodness for refining the
matching results. Let MPIa ;Ib ¼ {pi , qi }i¼1;2… be the set
The gradients of an image Iðx; yÞ smoothed by Gs ðx; yÞ at of matching pairs, where pi is an element in FPIa and qi
scale s in the x and y directions can be then defined, another element in FPIb : Let NeIa ðpi Þ and NeIb ðqi Þ be
respectively, as: denoted as the neighbors of pi and qi within a disc of radius
Ixs ðx; yÞ ¼ I p Gsx ðx; yÞ and Iys ðx; yÞ ¼ I p Gsy ðx; yÞ; R; respectively, where R is set to 200 in this paper. Assume
that NPpi qj ¼ {n1k , n2k }k¼1;2… is the set of matching pairs,
where p means a convolution operation. Then, the modulus where n1k [ NeIa ðpi Þ; n2k [ NeIb ðqj Þ; and all elements of
of the gradient vector of Iðx; yÞ is: NPpi qj belong to MPIa ;Ib : The proposed method is based on a
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi concept that if {pi , qi } and {pj , qj } are two good
l7I s ðx; yÞl ¼ lIxs ðx; yÞl2 þ lIys ðx; yÞl2 : matches, the relation between pi and pj should be similar to
the one between qi and qj : Based on this assumption, we can
If all the local maxima of l7I s ðx; yÞl are located and measure the goodness of a matching pair {pi , qi }
thresholded with a preset value, then all edge points of according to how many matches {n1k , n2k } in NPpi qi
Iðx; yÞ at scale s can be detected. Since we are interested in whose distance dðpi ; n1k Þ is similar to the distance dðqi ; n2k Þ;
some specific feature points for image stitching, additional where dðui ; uj Þ ¼ kui 2 uj k; the Euclidean distance between
constraints have to be introduced. Basically, this paper two points ui and uj : With this concept, the measure of
defines the feature point as the one whose edge response is goodness for a match {pi , qi } can be defined as:
the strongest with a local area. In addition, in order to
suppress the effect of noise, s is set to 2. In what follows, the X Cðn1k ; n2k Þrði; kÞ
two conditions adopted here for judging whether a point GIa Ib ðiÞ ¼ ;
1 þ distði; kÞ
Pðx; yÞ is a feature point or not are summarized as follows: {n1k ,n2k }[NPpi qi
296 J.-W. Hsieh / Image and Vision Computing 22 (2004) 291–306

where distði; kÞ ¼ ½dðpi ; n1k Þ þ dðqi ; n2k Þ=2; Cðn1k ; n2k Þ the has some false matching pairs, it is not guaranteed that four
correlation measure between n1k and n2k ; correct pairs will always be well selected. In what follows, a
( 2uði;kÞ=T1 Monte-Carlo style method is proposed to find a good
e if mði; kÞ , T2 initialization of camera parameters through a series of tries
rði; kÞ ¼ ;
0; otherwise and testing.

with the two predefined thresholds T1 and T2 ; and 3.3. Motion parameter estimation using Monte Carlo
method
ldðpi ; n1k Þ 2 dðqi ; n2k Þl
uði; kÞ ¼ :
distði; kÞ In Sections 3.1 and 3.2, two different strategies have been
proposed to obtain different motion parameters from
The contribution of a pair , {n1k n2k }
in NPpi qi monotoni- different views. In this section, a Monte-Carlo-style method
cally decreases based on the value of distði; kÞ: Besides, if is proposed for integrating these methods together for
the value of uði; kÞ is larger than the threshold T2 ; the further optimization process.
contribution of {n1k , n2k } is set to zero. The spirit of the Monte Carlo method is to use many
After calculating the goodness of each pair {pi , qi } in tries to find (or hit) the wanted correct solution. Assume
MPIa ;Ib ; we can obtain their relative goodness GIa Ib ðiÞ for each try can generate a solution and the probability to find
further eliminating false matches. Assume G  is the average
a correct solution for each try is r: After k tries, the
value of GIa Ib ðiÞ for all matching pairs. If the value of GIa Ib ðiÞ probability of continuous failure to find a correct solution
is less than 0:75 G;  the matching pair {pi , qi } is
is s ¼ ð1 2 rÞk : Clearly, even though r is very small, after
eliminated. hundreds or thousands of tries, s will tend very closely to
After eliminating impossible false matches, a set MPr of zero. In other words, if we define a try as a random
remained pairs can be found from MPIa ;Ib : Clearly, if four selection of four matching pairs, each try will generate a
correct matching pairs can be selected from MPr ; the solution by solving Eq. (9). Then, it can be expected that
desired solution M can be found by solving the following a correct solution M will be obtained after hundreds or
equation: thousands of tries.
As we know, for each try, four matching pairs will be
AM T ¼ b; ð9Þ selected for obtaining one possible solution of Eq. (1). If
MPr has Nr elements and Nc ones are correct, the
where probability to select four correct pairs for each try will
2 3
x1 y1 21 0 0 0 2x01 x1 2x01 y1 be
6 7
6 7
6 x2
6 y2 21 0 0 0 2x02 x2 2x02 y2 7
7
Nc ðNc 2 1ÞðNc 2 2ÞðNc 2 3Þ
:
6 7 Nr ðNr 2 1ÞðNr 2 2ÞðNr 2 3Þ
6 7
6 · ·· ·· · · ·· · · · · ·· · · · · ·· ·· · 7
6 7
6 7
6 7 In what follows, a method is proposed to improve
A ¼ 6 x4 y4 21 0 0 0 2x4 x4 2x4 y4 7 0 0
;
6 7 the probability for each try to find a correct solution by
6 7
6 7 separating images into grids. Assume all the correct
60 0 0 x1 y1 21 2y01 x1 2y01 y1 7
6 7 and false matching pairs distribute very randomly.
6 7
6 7
6 · ·· ·· · · ·· · · · · ·· · · · · ·· ·· · 7 Then, if the input images are segmented into several
4 5
grids, in each grid the probability to select a
0 0 0 x4 y4 21 2y04 x4 2y04 y4 correct matching pair is still Nc =Nr : Therefore, we
2 3
x01 can select four different girds first and then get one
6 7 matching pair from each grid. With this method, the
6 07
6 x2 7 probability to select four correct matching pairs will
6 7
6 7 become Nc4 =Nr4 : Clearly,
6 7
6 ·· · 7
6 7
6 7
6 7 Nc ðNc 2 1ÞðNc 2 2ÞðNc 2 3Þ N4
b ¼ 6 x04 7; and {ðxk ;yk Þt , ðx0k ;y0k Þt }k¼1;…;4 , c4
6 7
6 7 Nr ðNr 2 1ÞðNr 2 2ÞðNr 2 3Þ Nr
6 07
6 y1 7
6 7
6 7 if Nc , Nr : Thus, the suggested method can better
6 7
6 ·· · 7 enhance the hit rate of finding four correct matching
4 5
pairs to derive desired parameters.
y04
On the other hand, since the Monte Carlo method uses
the set of four selected pairs. Eq. (9) can be solved by using lots of tries to find final desired solutions, we should propose
the Householder transform [15]. However, since MPr still a verification process to determine which try is the best.
J.-W. Hsieh / Image and Vision Computing 22 (2004) 291–306 297

Fig. 4. Intensity adjustment: (a) original images Ia and Ib ; (b) after adjusting, the intensities between Ia and Ib are getting closer.

Assume M i ¼ ðmi0 ; mi1 ; …; mi7 Þ is the solution got from the  ¼ M 0 : Repeat
Step 4. Let i ¼ 1; k ¼ 0; C ¼ cðM 0 Þ; and M
ith try. The verification process can be achieved by the following steps:
comparing how many matching pairs in MPr are consistent Step 4.1. Randomly generate four different real
to M i : Let {p $ q} be a matching pair and the consistent numbers {am }m¼1;…;4 such that 0 # am , 1;
error eðp; q; M i Þ of this pair to M i be defined as: Step 4.2. Determine four different integers {bm }m¼1;…;4

vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
!2 !2
u
u mi0 px þ mi1 py þ mi2 mi3 px þ mi4 py þ mi5
i
eðp; q; M Þ ¼ t x
q 2 i x y
þ q 2 i x : ð10Þ
m6 p þ mi7 py þ 1 m6 p þ mi7 py þ 1

For each matching pair {pk $ qk } in MPr ; if satisfying Prðbm Þ # am , Prðbm þ 1Þ: If the set
eðpk ; qk ; M i Þ , Te ; the pair {pk $ qk } is said to be consistent {bm }m¼1;…;4 fails to be found, go to Step 4.1;
to M i ; where Te is a threshold set to 6 for the consistency Step 4.2. Obtain the set SiP of four matching pairs by
check. Based on Eq. (10), a counter cðM i Þ is used to record selecting one matching pair from the bm th gird for m ¼
how many matching pairs in MPr which are consistent to 1; 2; …; 4;
M i : After several tries, the best solution M  can be obtained Step 4.3. Get the solution M i from SiP by solving Eq.
as follows: (9);
Step 4.4. Calculate cðM i Þ; the number of matching pairs
 ¼ arg max cðM i Þ:
M ð11Þ in MPr which are consistent to M i ;
Mi
Step 4.5. If cðM i Þ . C then C ¼ cðM i Þ and M ¼ Mi;
When initialization ði ¼ 0Þ; M 0 is got from the edge Step 4.6. i ¼ i þ 1; If i , MaxIterations; go to Step 4.1;
alignment approach (see Section 3.1). Based on above
descriptions, details of the proposed method can be 3.4. Parameter refinement through optimization
summarized as follows:
With the Monte Carlo method, the best estimate M  can
3.3.1. Integrated parameter estimation algorithm be found from MPr : However, if an optimization process is
MaxIterations. Maximum number of iterations and set to  can be further refined. In Section 2, a method
applied, M
be 300 here.
 the desired
L: grid dimension and set to be 8 here, M:
solution.

Step 1. Use the edge alignment approach (see Section 3.1)


to get the initial solution M 0 : Calculate cðM 0 Þ; the
number of elements in MPr which are consistent to M 0 :
Step 2. Divide the input image into L £ L grids. From the
L2 grids, collect all the girds if it includes at least one
matching pairs as the set Sg :
Step 3. For each gird k in Sg ; denote NGrid ðkÞ as the number
of matching pairs in this grid. Calculate the probability
pMr ðkÞ of elements in MPr which will appear in the kth
PGrid ðkÞÞ=Nr : Obtain the cumulative
gird, i.e. pMr ðkÞ ¼ ðN
distribution PrðkÞ as k21j¼0 pMr ðjÞ: Fig. 5. Example to explain the blending technique.
298 J.-W. Hsieh / Image and Vision Computing 22 (2004) 291–306

Table 1
Two sets of synthetic camera motions used to generate the synthetic images of Figs. 6 and 7, respectively

Images Parameters

m0 m1 m2 m3 m4 m5 m6 m7

Synthetic image pair1 Real values 1.0 0.1 242.0 0.1 1.0 220.0 0.0 0.0
Estimated 1.00003 0.09960 241.35 0.09954 1.00034 220.587 0.0000018 0.000001
Synthetic image pair 2 Real values 1.0 0.1 240 20.1 1.05 290 0.0 0.0
Estimated 1.00005 0.0993 239.01 20.0998 1.05006 289.5 0.0000078 0.000004

The estimation results of model parameters are shown in rows 4 and 6, respectively.

for deriving desired parameters has been described by parameters by minimizing errors only on positions of
minimizing the discrepancy in intensities of all pixels feature points.
between two images. In this section, instead of minimizing In Section 3.2, two sets of feature points, i.e. FPIa and
the whole image, we will describe a method to find desired FPIb ; have been extracted from two images Ia and Ib ;

Fig. 6. Stitching result of two synthetic temple images generated with the camera parameters m0 ¼ 1:0; m1 ¼ 0:1; m2 ¼ 242; m3 ¼ 0:1; m4 ¼ 1:0; m5 ¼ 220;
m6 ¼ 0; and m7 ¼ 0; (a) and (b) are the pair of synthetic images and (c) is the stitching result.
J.-W. Hsieh / Image and Vision Computing 22 (2004) 291–306 299

respectively. For each point pi in FPIa and qj in FPIb ; where {pk , qk } is an element in MPM : By calculating the
 if eðpi ; qj ; MÞ
according to Eq. (10) and M;  , Te ; we denote gradient and Hessian matrix of F; M can be updated with
{pi , qj } as a new match. Then, after checking all elements the iterative form:
in FPIa and FPIb ; a new set MPM of matching pairs can be
M  Tt þ ðA þ lÞ21 B;
 Ttþ1 ¼ M ð13Þ
obtained as:
MPM ¼ {pk , qk ; k ¼ 1; 2; …; NM }; where t is the iteration number,
where pk [ FPIa ; qk [ FPIb ; and eðpk ; qk ; MÞ
 , Te : Then, X
NM
›ek ›ek X
NM
›e k
we can define an error function as: ½Aij ¼ ; ½Bi ¼ ek ;
k¼1
›m
 i ›m
j k¼1
›m
i
X
NM
 ¼
FðMÞ eðpk ; qk ; MÞ;
 ð12Þ and l is a coefficient obtained by the Levenber –Marquardt
k¼1 method [15]. The above minimization process quickly

Fig. 7. Stitching result of the synthetic ‘White House’ images generated with the camera parameters m0 ¼ 1:0; m1 ¼ 0:1; m2 ¼ 240; m3 ¼ 20:1; m4 ¼ 1:5;
m5 ¼ 290; m6 ¼ 0; and m7 ¼ 0; (a) and (b) are the synthetic images and (c) is the stitching result.
300 J.-W. Hsieh / Image and Vision Computing 22 (2004) 291–306

converges since only the coordinates of feature positions are where lAl is the overlapping area of Ia and Ib ; pi a pixel in
 is
considered into minimization and the initial estimate of M Ia ; and qi its corresponding pixel in Ib : Assume Wa and
very close to the final solution. Wb are the widths of Ia and Ib ; respectively. According to
DI; Wa and Wb ; the intensities of Ia and Ib will be
3.5. Blending technique for mosaic construction adjusted as:

In general, when stitching two adjacent images DI


Ia ðpðx; yÞÞ ¼ Ia ðpðx; yÞÞ þ x and Ib ðqðx; yÞÞ
together, some unwanted discontinuities of intensity will 2Wa
exist between their common areas. In what follows, a two-
DI
stage scheme will be proposed to smooth such intensity ¼ Ib ðqðx; yÞÞ þ ðx 2 wb Þ : ð15Þ
discontinuities. First, an intensity adjustment method is 2Wb
proposed for adjusting two adjacent images to have
similar intensities. Then, a blending technique is used to After adjusting, the intensities of Ia and Ib in Fig. 4(a)
construct a sealess mosaic according to a distance will be gradually changed into the intensity line EF in
weighting function. Like Fig. 4, let DI be the average Fig. 4(b). Furthermore, at the second stage, in order to make
intensity difference between the overlapping areas of Ia the overlapping area between Ia and Ib more smoothly, a
and Ib calculated by ray-casting method is then used to blend the intensities of
different pixels together. Like Fig. 5, pi is a pixel in Ia ; qi is
its corresponding pixel in Ib ; and la and lb are two boundary
1 X
DI ¼ ðI ðq Þ 2 Ia ðpi ÞÞ; ð14Þ lines of Ia and Ib ; respectively. With pi and qi ; the intensity
lAl i[A b i of the corresponding pixel ri in the composite image Ic can

Fig. 8. Stitching results when different feature masks are used. (a) and (b) Pairs of results of feature extraction and matching. The sizes of used masks are
15 £ 15, 23 £ 23, 35 £ 35 and 51 £ 51 masks, respectively. (c) Stitching results obtained by accordingly stitching pairs of images in (a) and (b).
J.-W. Hsieh / Image and Vision Computing 22 (2004) 291–306 301

Table 2
Estimation results of camera parameters obtained from pairs of images shown in Fig. 8 when different feature masks are used

Mask size Parameters

m0 m1 m2 m3 m4 m5 m6 m7

True values 1.0 0.1 242.0 0.1 1.0 220.0 0.0 0.0
15 £ 15 mask 1.00001 0.09999 241.55 0.09997 1.00024 220.224 0.0000019 0.000002
23 £ 23 mask 1.00002 0.09989 241.15 0.09955 1.00025 220.157 0.0000017 0.000001
27 £ 27 mask 1.00003 0.09960 241.35 0.09954 1.00034 220.587 0.0000018 0.000001
35 £ 35 mask 0.99995 0.09998 242.01 0.09994 1.00023 219.998 0.0000016 0.000001
43 £ 43 mask 0.99991 0.09999 241.45 0.09996 1.00034 220.614 0.0000012 0.000004
51 £ 51 mask 0.99991 0.09999 241.45 0.09996 1.00034 220.614 0.0000012 0.000004

be obtained by: With Eq. (16), the intensities of Ia will be gradually changed
to Ib :
dbe Ia ðpi Þ þ dae Ib ðqi Þ
Ic ðri Þ ¼ ; ð16Þ
dae þ dbe 3.6. Complexity analysis

where da is the distance between pi and la ; db the distance In order to understand the efficiency of the proposed
between qi and lb ; and e an exponential order for weighting. method, in what follows, details of complexity analysis of

Fig. 9. Stitching result of a series of panoramic images. (a) Series of panoramic images. (b) Stitching result.
302 J.-W. Hsieh / Image and Vision Computing 22 (2004) 291–306

each proposed algorithm will be given. Assume all input verified by comparing the differences between the estimated
images are with the same dimension NI £ NI : Then, the camera parameters and the true ones.
complexity of extracting edge features will be OðNI2 Þ and Fig. 6 shows the stitching result of the synthetic
thus the proposed edge alignment method has the temple image. The red symbol, ‘ þ ’, indicates the
complexity OðNI2 Þ: As to the feature matching method, positions of located feature points. The symbols, shown
the time complexity of feature extraction is OðNI2 Þ and the in (a) and (b), with the same index mean they are a
number of feature points will increase according to the matching pair. From (c), clearly, even though the
order OðNI2 Þ: However, since the used feature points are displacement between (a) and (b) is larger, the proposed
extracted along edges and constrained by a window mask, method still works well to stitch them together. Fig. 7 is
the number of feature points can be properly controlled to another result of synthetic images, i.e. the White House.
increase with the order OðNI Þ: Thus, the complexity of Although the overlapping area between these images is
correlation matching is OðK 2 NI2 Þ; where K 2 is the mask small, the proposed method still successfully stitches
size for correlation calculation (Eq. (7)). As to the them together. On the other hand, in order to examine
algorithm to eliminate false matches, since the number the robustness and sensibility of the proposed feature
of matching pairs will increase according to the order extraction and matching method, we used several masks
OðNI2 Þ; the complexity to eliminate false matches is
with different sizes to locate different features for
OðR2 NI2 Þ; where R is a disc of radius to calculate the
matching (see Section 3.2.1). If the number of feature
goodness of a feature point. However, the number of
points is large or too small, quite error in feature
feature points located within the radius R of a feature
matching will be produced. In this experiment, ten masks
point is less than K 2 : Therefore, the complexity to
with different sizes are used, i.e. 15 £ 15, 19 £ 19,
eliminate false matches will be OðK 2 NI2 Þ: As to the
Monte Carlo method, its complexity depends on the
number of iterations, i.e. OðMaxIterationsÞ: Since K 2 NI2 @
MaxIterations; the complexity to find an initial solution of
M (see Eq. (1)) through feature matching is still OðK 2 NI2 Þ:
Then, the scheme to combine the edge alignment and the
feature matching method together for obtaining a good
initial solution of M is still OðK 2 NI2 Þ:
As to the final optimization method (Eq. (13)), each
iteration to refine desired camera parameters is a ratio of
image size, i.e. rNI2 ; where r ! 1: Thus, the used
optimization method has the complexity Oðtmax rNI2 Þ;
where tmax is the maximum number of iterations. Thereby,
the total complexity of finding desired camera parameters is
Oððtmax r þ K 2 ÞNI2 Þ; Since the optimization is focused on
features points, the term tmax r will be less than K 2 : In
addition, the proposed blending technique has the time
complexity OðNI2 Þ: Therefore, the proposed algorithm for
mosaic construction and object detection is with the time
complexity OðK 2 NI2 Þ:

4. Experimental results

In order to analyze the performance of the proposed


method, a series of synthetic and real images were adopted
as test images. The synthetic images are used to verify
whether the proposed method is accurate and robust. All the
synthetic images are of size of 512 £ 512. For each
synthetic image pair, one image is used to generate another
one with a synthetic affine camera motion. All the synthetic
models are shown in the rows 3 and 5 of Table 1,
respectively. After applying the proposed method into
these synthetic images, the results of parameter estimation
are obtained and shown in the rows 4 and 6, respectively. Fig. 10. Stitching result of two images with larger lighting changes. (a and
The accuracy of the proposed algorithm can be easily b) Two adjacent images. (c) Stitching result.
J.-W. Hsieh / Image and Vision Computing 22 (2004) 291–306 303

Fig. 11. Stitching result when images have moving objects. (a and b) Two adjacent images with moving objects. (c) Stitching result.

Fig. 12. Stitching result when the camera has some rotation change. (a and b) Two adjacent images. (c) Stitching result.
304 J.-W. Hsieh / Image and Vision Computing 22 (2004) 291–306

23 £ 23,…, and 51 £ 51. For this examination, the edge larger lighting changes, the proposed method still works
alignment method (see Section 3.1) is not used. Fig. 8(a) well to find all desired camera parameters for stitching.
and b show the results of feature extraction and matching Fig. 11 shows the case when images have some moving
listed according to the sizes 15 £ 15, 23 £ 23, 35 £ 35 objects. The moving object will disturb the work of
and 51 £ 51, respectively. Table 2 shows details of image stitching. However, the proposed method still
camera parameters estimated when these masks are used. successfully stitches them together. Fig. 12 shows the
Clearly, even though different masks are used, proper result when images have some rotation and
matching pairs still can be found and thus desired camera skewing effects. In this case, the proposed Monte Carlo
parameters can be very accurately estimated. The method still works well to find the correct camera
proposed method failed to stitch images when a parameters.
55 £ 55 mask is used since too few features are extracted The proposed method also can be used in camera
for matching. compensation for extracting moving objects from video
Fig. 9 shows the result for mosaic construction when sequence. Fig. 13 shows two frames got from a movie. In
a series of panoramic images are used. In this case, order to detect the moving object, a static background
before stitching, all the images are projected into a should be constructed. With the proposed method, the
cylindrical map [5]. Then, only the translation parameters camera motion between Fig. 13(a) and (b) can be
need to be estimated. Fig. 10 shows the case when well found and compensated. Fig. 13(c) is the mosaic of
images have larger intensity differences: (a) and (b) are Fig. 13(a) and (b). Then, the moving object can be
the original images and (c) is the stitching result. The detected by image differencing like Fig. 13(d). The
large lighting changes will lead to the instability of detection result is very useful for various applications
feature matching in the traditional matching techniques like intelligent transportation system, video indexing,
like block matching or phase correlation. However, in video surveillance, and, etc. Fig. 14 is another case
this paper, the proposed edge alignment algorithm tries to when a moving car appears in the video sequence. From
find all possible translations by checking the consistence the experimental results, it is obvious that the proposed
of edge positions instead of comparing the intensity method is indeed an efficient, robust, and accurate method
similarity of images. Therefore, even though images have for image stitching.

Fig. 13. Mosaic construction and object detection when images have a moving object: (a and b) are two adjacent images; (c) is the mosaic result of (a) and (b);
(d) is the object detection result by image differencing.
J.-W. Hsieh / Image and Vision Computing 22 (2004) 291–306 305

Fig. 14. Mosaics and object detection when images have a moving object: (a and b) are two adjacent images; (c) is the mosaic result of (a) and (b); (d) is the
object detection result by image differencing.

5. Conclusions (b) A new feature extraction scheme was proposed to


extract a set of useful features. Experimental results
In this paper, we have proposed an edge-based method have proved the scheme works very well under
for stitching series of images from a video camera. In this different conditions.
approach, for robustness consideration, the initial estimate (c) When building correspondences, a new scheme was
is estimated from two different schemes, i.e. the edge proposed to eliminate many false matches by judging
alignment approach and the correspondence-based one. the goodness of a matching pair. Through the
Since the two methods are complementary to each other, judgment, a set of desired correspondences can be
much robustness can be gained during the parameter obtained more reliably.
estimation process. To integrate these two methods (d) A grid partition scheme was proposed to enhance the
together, a Monte-Carlo style method is proposed to find hit rate of obtaining four correct matching pairs. Then,
the best motion parameters. Then, the solution is refined the correct parameters can be found with less tries.
through an optimization process. The contributions of this (e) An efficient optimization process was proposed for
paper can be summarized as follows: refining the estimated parameters more accurately.
Since only the errors on feature positions are
(a) This paper proposed an edge alignment scheme for considered, the minimization process can be performed
estimating translation parameters using edges. extremely efficiently.
The method has better capabilities to overcome the
problems of large displacements and lighting changes Experimental results have shown our method is superior
between images. in terms of stitching accuracy, robustness, and stability.
306 J.-W. Hsieh / Image and Vision Computing 22 (2004) 291–306

References [11] I. Zoghlami, O. Faugera, R. Deriche, Using geometric corners to build


a 2D mosaic from a set of images, Proc. Conf. Comput. Vis. Pattern
Recognit., Puerto Rico (1997) 420 –425.
[1] H. Sawhney, S. Ayer, Compact representation of video through [12] C.T. Hsu, Feature-based video mosaic, Proc. ICIP, Vancouver,
dominant and multiple motion estimation, IEEE Trans. Pattern Anal. Canada 2 (Sep) (2000) 887–890.
Machine Intell. 18 (Aug) (1997) 814 –830. [13] J. Davis, Mosaics of scenes with moving objects, IEEE Proc. CVPR
[2] M. Irani, P. Anandan, Video indexing based on mosaic representation, (1998).
Proc. IEEE 86 (May) (1998) 905 –921. [14] M. Sonka, V. Hlavac, R. Boyle, Image Processing, Analysis and
[3] M. Bonnet, Mosaic representation for video shot description, Proc. Machine Vision, Chapman and Hall, London, UK, 1993.
MPEG-7 Eval. Ad Hoc Meeting 636 (Feb) (1999). [15] W. H. Press, S. A. Teukolsky, W. T. Vetterling, B. P. Flannery,
[4] C. Kuglin, D. Hines, The phase correlation image alignment method, Numerical Recipes in C: the Art of Scientific Computing, Cambridge
Proc. IEEE Int. Conf. Cybernet. Soc. (1975) 163–165. University Press, Cambridge, January 1993.
[5] S. Chen, Quicktime VR—an image-based approach to virtual
environment navigation, Proc. SIGGRAPH’95 (1995) 29– 38.
[6] R. Szeliski, Video mosaics for virtual environments, IEEE Comput. Further reading
Graph Appl. 16 (March) (1996) 22–30.
[7] H.Y. Shum, R. Szeliski, Systems and experiment paper: construction
C. Guestrin, F. Cozman, E. Krotkov, Fast software image stabilization with
of panoramic image mosaics with global and local alignment, Int.
J. Comput. Vis. 36 (2) (2000) 101–130. color registration, In Proc. Intell. Robots Syst. Conf., Victoria, Canada
[8] R. Szeliski, H.Y. Shum, Creating full view panoramic image moaics October (1998) 19 –24.
and environment maps, Proc. Comput. Graphics Annu. Conf. Ser. J.W. Hsieh, H.Y. Mark Liao, K.C. Fan, M.T. Ko, Y.P. Hung, Image
(1997) 251–259. registration using a new edge-based approach, Comput. Vis. Image
[9] J.S. Jin, Z. Zhu, G. Xu, A stable vision system for moving vehicles,
Understand. 67 (1997) 112–130.
IEEE Trans. Intell. Transport. Syst. 1 (1) (2000) 32–39.
[10] H. Nicolas, New methods for dynamic mosaicking, IEEE Trans. S. Mann, R.W. Picard, Virtual bellows: constructing high-quality images
Image Processing 10 (8) (2001) 1239–1251. for video, Proc. ICIP (1994) 363–367.

Potrebbero piacerti anche