Sei sulla pagina 1di 9

Pattern Recognition 42 (2009) 313 -- 321

Contents lists available at ScienceDirect

Pattern Recognition
journal homepage: w w w . e l s e v i e r . c o m / l o c a t e / p r

Front-view vehicle detection by Markov chain Monte Carlo method


Yangqing Jia , Changshui Zhang
Department of Automation, Tsinghua University, FIT 3-120, Beijing 100084, China

A R T I C L E

I N F O

Article history:
Received 29 January 2007
Received in revised form 25 June 2008
Accepted 29 July 2008

Keywords:
Vehicle detection
Bayesian method
Maximizing a posteriori
Markov chain Monte Carlo

A B S T R A C T

In this paper, we propose a new vehicle detection approach based on Markov chain Monte Carlo (MCMC).
We mainly discuss the detection of vehicles in front-view static images with frequent occlusions. Models
of roads and vehicles based on edge information are presented, the Bayesian problem's formulations are
constructed, and a Markov chain is designed to sample proposals to detect vehicles. Using the Monte
Carlo technique, we detect vehicles sequentially based on the idea of maximizing a posterior probability (MAP), performing vehicle segmentation in the meantime. Our method does not require complex
preprocessing steps such as background extraction or shadow elimination, which are required in many
existing methods. Experimental results show that the method has a high detection rate on vehicles and
can perform successful segmentation, and reduce the influence caused by vehicle occlusion.
2008 Elsevier Ltd. All rights reserved.

1. Introduction
Collecting and generating vehicle information from either videos
or static images is a basic task of intelligent transportation systems. In
the recent years, computer vision-based methods have been widely
applied to solve the problem. Compared with traditional methods
such as using electromagnetic coils embedded underground, these
new methods have the advantage of less cost, more accurate detections, and higher flexibility. The cameras can be installed on poles,
bridges or buildings near the road, and can be used for surveillance
in proper operation for a long time with low upkeep once installed
and calibrated.
Vehicle detection has received much attention over the recent
years. Beymer et al. [1] divided the methods of vehicle detection into
four classes as 3D model-based, regional-based, active contour-based
and feature-based, each of which has its advantages and disadvantages as described in Ref. [1]. Achler et al. [2], Gupte et al. [3], and
Koller et al. [4] used an estimated background to extract moving objects and updated the background for real-time luminance changes.
However, this method requires an initial background with no foreground objects, and updating the background information occupies
a considerable time in the whole process of detection. Haag et al. [5]
used edge information and set up an optical flow model to detect
vehicles. Handmann et al. [6] performed the detection using a neural
network, based on local-oriented coding and entropy analysis.

Corresponding author. Tel.: +86 10 62796872.


E-mail addresses: jiayq06@mails.tsinghua.edu.cn, jiayq84@gmail.com (Y. Jia),
zcs@mail.tsinghua.edu.cn (C. Zhang).
0031-3203/$ - see front matter 2008 Elsevier Ltd. All rights reserved.
doi:10.1016/j.patcog.2008.07.015

However, there still exists a number of open problems to be


solved, one of which is congestion. In this case, heavy traffic flow
makes inter-vehicle occlusion a common scenario, and further makes
detecting vehicles more difficult. All methods referred above does
not explicitly consider occlusions, and are not applicable under such
condition. For segmenting vehicles under occlusions, Kamijo et al. [7]
utilized a Markov random field (MRF) model to segment and track
vehicles, assuming that every vehicle should appear unoccluded at
a certain starting area of the interested area first. Kanhere et al. [8]
detected and tracked features based on several motion-related cues
to segment vehicles from videos taken from a low-angel camera.
However, these methods all use information such as background and
motions, and should be performed on videos or at least image sequences.
Despite the large amount of literature on vehicle detection using videos, the detection using static images has relatively less work
done, due to lack of motion information that many current methods
require. However, when facing a heavy traffic (which is common in
the urban area), the traffic slows down or even halts for minutes,
and vehicles get occluded. Even using videos, it is difficult to extract
motion information or update background for detection. Actually, in
such cases, we come to the problem of detecting vehicles in static images. Moreover, we believe that static images already contain enough
information for detection, which is less redundant than images and
can lighten calculation pressure. Detection in static images is also a
more fundamental task, and can be considered as a step or a subtask
whose information can be integrated during video-based detection.
Some attempts have been made to detect vehicles in static images. The methods are generally model-based in order to describe
what a vehicle in the image is, and follow the general object

314

Y. Jia, C. Zhang / Pattern Recognition 42 (2009) 313 -- 321

respect to prior and posterior probabilities. The approach we propose


consists of three main stages:

Fig. 1. A sample input image containing multiple lanes and vehicles (the lanes are
sketched out by trapeziums manually).

recognition ideas originated from computer vision. Detection has


been performed as in the work of Sullivan [9] and Roller et al. [10],
both of them used vehicle models and a ground-plane constraint
for orientation. Zhou et al. [11] casted the vehicle detection as a
pattern recognition problem, and trained a PCA-and-wavelet-based
classifier to detect vehicles in static images. Rojas et al. [12] and Tsai
et al. [13] used a similar method by searching for important vehicle
colors to detect vehicles, but required relatively strict requirements
to image qualities such as good luminance and little shadow. Methods described in Ref. [14] used 2D shape models to detect vehicles
in aerial images. The object recognition-based idea can be further
seen in the recent works such as Refs. [15,16]. All of the methods referred above mainly focus on detecting vehicles or reasoning about
their categories, and do not consider segmentation as the main issue.
Thus they either can be applied to the unoccluded conditions only, or
do not consider the solution of inter-vehicle occlusions explicitly in
their work. As for occlusions, Song et al. [17] used a Bayesian method
to segment vehicle blobs based on 2D black and white images projected from 3D vehicle shape models. Oberti et al. [18] modeled the
shapes of vehicles by corners. Pang et al. [19] used a decomposition method based on cubic models to segment occluded vehicles.
However, although these methods can be performed on static images
and get promising results, they all require certain initial information, most commonly background extraction. Such initial information is hard to obtain simply by static images. Thus these methods
cannot be performed on static images alone, and require other prior
information (usually obtained via videos) to assist the initialization
of their algorithm.
In this paper, we focus on the problem of detecting and segmenting vehicles in static images. The images are retrieved by a camera
fixed on a pole beside the road or a bridge over the road, about a few
meters from the ground, as would be common in many surveillance
applications. It can be used for detecting vehicles in multiple lanes.
We model the images under the following assumptions: The road in
the image is front-view, i.e., vehicles appear at the top and disappear
at the bottom of the image; the road may contain multiple lanes; occlusion is frequent and vehicles may cover or be partially covered by
others. An example of the scenes is shown in Fig. 1. These are common conditions for currently applied traffic surveillance cameras.
We cast the vehicle detection and segmentation problem as
searching for vehicles from front to rear sequentially, detecting one
vehicle at a time and the vehicle behind the detected one at the
next time. The determination of each vehicle is casted as finding
the parameters to maximize a posterior probability (MAP) process.
Note that except for the assumption we made above, our method
does not need other information such as background or light
conditions.
In the following sections, we present a detailed description of our
approach. We propose a generative vehicle model based on edges,
and construct a Markov chain Monte Carlo (MCMC) method with

(1) Model construction. Our method is based on image edges, so


the image is first preprocessed to retrieve its edge features. We
also define the models of roads and vehicles accordingly for our
method.
(2) Bayesian formulation. Since the vehicle detection and segmentation problem is casted as a Bayesian problem of finding a MAP
solution, we define the corresponding formulations: the prior
probability and likelihood of vehicle proposals are defined, from
which the form of the posterior probability is derived to evaluate different proposals.
(3) Vehicle detection using MCMC. We construct a Markov chain
to sample the proposals in the parameter space, and use the
Monte Carlo method with simulated annealing to search for the
position and other related parameters that fixed actual vehicles
most.
The following sections will describe each of the stages in detail. Then,
we show the experimental results and draw the conclusion of our
paper.

2. Model construction
The first step in establishing our method is to construct models
to describe the road and vehicles. There are various features that can
be used to detect vehicles. In our method, we choose the edge information in the image based on the fact that vehicle zones have a
number of edges while road zones have few edges, and that edge information are less affected by environment changes compared with
other features such as vehicle colors and image gray values. We construct the mathematical definition of road lanes and vehicle models.
2.1. Road model
The road is formed of lanes that are processed separately in our
algorithm. We assume that there is little distortion in the image, or
it can be eliminated by camera calibration by a preprocessing algorithm. Thus every lane in the image can be modeled as a trapezium
as shown in Fig. 1. Assuming that the lane is planar, we can map the
image point (u, v) to the actual position on the road (x, y) = Map(u, v)
with a simplified perspective transform:
(tu, tv, t)T = A33 (x, y, 1)T

(1)

where the 3 3 transform matrix A is calculated according to the


mapping relationship between the four corner points of the trapezoidal image zone and those of the actual rectangular lane zone.
This matrix is determined by calibration during the installation of
the camera. Once the system is installed and the camera is fixed, the
value of A is fixed and can be constantly used. After this, we can calculate the actual distance along the x and y direction between image
point (u1 , v1 ) and (u2 , v2 ) as
(x, y) = Map(u1 , v1 ) Map(u2 , v2 )

(2)

In the front-view condition, x is the distance across the road, indicating the extent to which the vehicle is off the center of the lane.
y is the distance along the road, such as the distance between one
vehicle and the following one. The distance is further used to calculate the prior probability in Section 3.

Y. Jia, C. Zhang / Pattern Recognition 42 (2009) 313 -- 321

315

Fig. 2. The edge extraction and edge-distance transform. (a) is the original image; (b) is the edge extracted; (c) represents edge distance value. The gray value of the points
(black = 0 and white = 255) is proportional to the edge distance value.

Fig. 3. Four sample vehicle models. The 1st and 4th column is the original image of the vehicle model; the 2nd and 5th column are the vehicle edge; the 3rd and 6th
column are the vehicle zone. All the images are normalized to 50 pixels.

2.2. Image preprocessing


We choose the Canny algorithm [20] which is proved adaptable
to various environments. Canny algorithm is originally defined on
grayscale images, and as for an RGB color image I, we define the
edge E(I) as the union of every channel i {R, G, B}:

E(I) =
E(Ii )
(3)
i{R,G,B}

where E(Ii ) is the edge image calculated by Canny algorithm on


the single channel. After the edge extraction, we perform an edgedistance transform to calculate the distance from every point X=(u, v)
to the nearest edge point in the edge set E:
Dist(X) = min D(X, X  )
X  E

10 angles for each type of vehicle. Fig. 3 shows four examples from
the library.
A vehicle model in the library includes the following information:
(1) Model number. It is used as an index.
(2) Vehicle edge. This is the edge of the vehicle. To keep conformity,
it is calculated from the image of the vehicle using Canny edge
extraction algorithm described above.
(3) Vehicle zone. It defines the image zone that belongs to the vehicle. The third column of Fig. 3 shows some examples. After finding a certain vehicle fitting the model, the corresponding zone
can be determined by mapping the white points of the zone image to the actual image. The zone of the model is defined by
manual marking.

(4)

where D(X, X  ) is the Euclidean distance.


The edge extraction and edge-distance transformation procedures are illustrated in Fig. 2.
2.3. Vehicle model
The motivation of constructing the vehicle model is that although
vehicles differ from their appearances such as color and other detailed features, in the front-view angle their appearances may look
similar to each other. For example, all kinds of minivans may look
like the one in the second row of Fig. 3. Thus, we construct a vehicle
model library containing typical vehicles to describe the characteristics of different vehicles. These vehicles are extracted from actual
images manually.
In practice, we extract typical vehicles that are from different
types (such as cars and SUV) and have different front-view angles
as vehicle models. Each vehicle model represents a corresponding
type and angle of real-world vehicles. Specifically, we extract about

The vehicle edge and vehicle zone feature are both represented as
a binary matrix, and the vehicle model is equivalent to a rectangle
bounding the vehicle.
Additionally, we expect the resolution of the image to be such
that the size of the vehicles is at least 50 pixels wide. This is to ensure
that we have enough pixels to calculate the probability function. In
practice, an 800 600 image meeting the requirements above can
contain about four lanes and can cover a length of about 80 m. This
is a reasonable size that meets most of practical requirements. Thus,
we normalize the width of every vehicle model as 50 pixels, and the
height varies with different vehicle types.

2.4. Candidate parameters


Given an image I, a vehicle in I can be described using following
information: (1) the position of the vehicle, (2) the size of the vehicle,
and (3) the type of the vehicle. We define a rectangular zone which
denotes a probable vehicle (called a candidate S) using following

316

Y. Jia, C. Zhang / Pattern Recognition 42 (2009) 313 -- 321

parameters:
S = {x, y, m, w, h}

(5)

Every candidate S denotes a rectangular zone that might be a vehicle.


(x, y) is the middle-bottom coordinate of the candidate zone. m is the
index of vehicle model used by the candidate, which takes integral
value from {1, 2, . . . , N} where there are N models in the library. (w, h)
is the width and height of the zone in the image.
3. Bayesian formulation
Based on the definitions in Section 2, the problem of vehicle detection can then be abstracted as: given an image I (without loss of
generality, we suppose that the image contains only one lane), we
need to find the number of vehicles k and the parameters of every
vehicle. Assume that the number of k is determined, the search for
parameters is to maximize the posterior probability P(S1 , S2 , . . . , Sk |I).
Although it is possible to search for vehicles and determine the number simultaneously, the optimization problem clearly involves a too
high dimensionality and is difficult to solve. Actually, vehicles appear on a road are not all correlated. In practice, vehicles appear
sequentially on a lane and largely follow the Markov property that
the position of a vehicle depends only on the one in front of it. This
has also been justified by the well discussed car-following model in
ITS. Thus it is possible to detect cars one by one sequentially and
stop when the algorithm reaches the far end of the image. In another
word, given vehicle Si , we make a Markov assumption that the state
of vehicles Si+1 Sk following Si is independent from S1 to Si1 that
are in front of Si . So, the relationship of vehicles can be modeled by
the following Bayesian network:

S1


S2


Si1


Si


Sk

So we have
P(S1 , . . . , Sk |I) =

k


P(Si |I, Si1 )

(6)

i=1

(7)

We use a counting method by laying the determination of k aside


and searching for vehicles first. After the search is finished, i.e., the
algorithm reaches the rear of the road, we count the number of detected vehicles as the number k. Empirically, this sequential searching method works well and can satisfactorily find the correct number
of vehicles.
Next we consider detecting vehicles sequentially. The problem is
also abstracted as finding the maximum posterior probability of the
vehicle candidate. Given the current image I and preceding detected
vehicles, the posterior probability of a certain candidate S written as
P(S|I) represents the extent that the candidate is a real vehicle in the
image. According to the Bayes rule, we have
P(S|I) P(S)P(I|S)

3.1. Prior probability P(S)


The prior probability P(S) of a certain candidate S={x, y, m, w, h}
can be decomposed to
P(S)=p(x, y, m)p(w, h|x, y, m)
= px (x)py (y)pm (m)pw (w|x, y, m)ph (h|x, y, m, w)

(8)

We will discuss the calculation of the two functions P(S) and P(I|S),
respectively, in the following part.

(9)

where p(x, y, m) = px (x)py (y)pm (m) because parameters x, y, m are


considered to be independent. We discuss the calculations one by
one.
The prior probability of parameter y while given the preceding
vehicle's position y0 is dependent on the vehicles' distance y =|y0 y|.
We adopt a modified Erlang distribution with offset to calculate the
probability as follows:
py (y) =  exp{(y L)}

((y L))(c1)
(c 1)!

where c = 1, 2, . . .

This indicates that we can detect vehicles sequentially from front


to rear instead of taking the burden of detecting them all at a time.
For simplicity (and with a slight abuse of terminology), we write
P(Si |I, Si1 ) as P(S|I) in the following parts of the article when there
is no ambiguity.
A still existing problem is, how to decide the proper vehicle number k when given an image? This is the problem of finding the maximum value of the best probability estimation under all possible k:
max P(S1 , . . . , Sk |I)

Fig. 4. The distribution of probability along the y direction after the first vehicle
(framed) is detected. The left graph is the typical Erlang distribution without offset,
and the right graph shows our data-driven method to automatically calculate offset
to improve the evaluation of probability.

(10)

the parameters  and c are determined by experience and different


conditions. When congestion is frequent, a larger c can lead to better effect. Parameter L is an offset value representing the minimum
distance between two vehicles. The reason why we adopt the Erlang
distribution is that it is widely used and experimentally proved reasonable in traffic modeling [21]. Typical Erlang distribution does not
contain the offset parameter L. In our method, a data-driven method
determines the parameter L by inspecting the amount of edges along
the y direction. The basic motivation is that while we aim to detect
the vehicle following the last detected one, we can eliminate the
gap area between the two where there are no actual vehicles. As
shown in Fig. 4, our method avoids sampling in the gray part of the
left distribution in order to boost the process searching for MAP. It
reduces the time spent searching for vehicles in unnecessary zones
with too few edges to support a vehicle, while still keeping Erlang
distribution's applicability.
Because vehicles appear more frequently at the middle of the lane,
the prior probability of parameter x can be calculated according to
the distance between x and the center of the lane x0 . We define the
prior probability as a clipped Gaussian distribution with variance 2x :



2
1 exp x
,
px (x) x
22x

|x| < W/2

(11)

otherwise

where x = |x0 x| is the distance and W is the width of the road.

Y. Jia, C. Zhang / Pattern Recognition 42 (2009) 313 -- 321

317

Fig. 5. The mapping procedure of the vehicle model. The left one shows the case of a single vehicle. Note that it represents one step during the searching progress and has
not yet converged to the MAP position. The middle one shows the case of detecting an occluded vehicle with the precedent vehicle (the one marked dark) detected. The
mapped edge point set M can then be divided into the uncovered part M and the covered part M M shown in the right graph. The model has been placed at its MAP
position, i.e., after the searching phase, it is just the detection result.

The prior probability of vehicle model m can be determined according to the frequency of every kind of vehicle model. In the experiment of this article, we simply set every model's prior probability
to be the same, i.e., pm (m) = 1/N.
The width and the height of the candidate in the image are dependent on the candidate's position, due to the projection effect.
Generally, shown in the image, vehicles at the front may have larger
width and height values while those at the rear may have smaller
values. Assume that the width of the lane at the candidate's position is W, we define the expectation of the vehicle width as Kw W
where Kw is a parameter determined by prior knowledge, the candidate's position value (x, y), and the model m. Similar to x, we define
the prior probability of w as a clipped Gaussian distribution with
variance 2w :



2
1 exp (w Kw W ) , K W  w  K W
l
h
pw (w|x, y, m) w
22w

0
otherwise

(12)

where Kl and Kh is the lower bound and higher bound ratio, respectively. When the width and the vehicle model of the candidate are
defined, we can simply calculate the height of the candidate by the
model's heightwidth ratio. However, to improve robustness, we allow the height to vary at a small range, too. Thus the prior probability
of the height ph (h|x, y, m, w) can be defined similar to Eq. (12).
3.2. Likelihood P(I|S)
Likelihood evaluates the matching of a candidate and actual vehicles. We design the likelihood function based on the criteria that: (1)
the function reaches its maximum with the parameter corresponding to the position of an actual vehicle; (2) the peak of the function is
as sharp as possible. In the meantime, the function should consider
and eliminate the difficulty brought up by occlusions.
We start calculating the likelihood by first mapping the selected
vehicle model of the candidate to the corresponding image area
according to the given parameters as shown in Fig. 5. The width
and height of the original standard-sized model are resized, and are
placed according to the coordinate of the candidate. We define the
mapped edge points of the model (shown as in Fig. 5) as set M.
We use four criteria to form up the likelihood function: the arithmetic mean (l1 ) and the standard deviation (l2 ) of the edge distance,
the uncounted edge in the image (l3 ), and the ratio of the candidate not being covered by preceding vehicles (l4 ). The four parts are
discussed in the following paragraphs.

The arithmetic mean of the edge distance is the average of the


edge distance values (calculated as in Section 2.2) in the set M:
l1 =

1
|M|

Dist(x, y)

(13)

(x,y)M

where |M| is the number of points in M and Dist() denotes the edgedistance value. The arithmetic mean of the edge distance might lose
its accuracy on describing likelihood when part of the mapped area
contains dense edges. Thus, we consider the standard deviation of
the distance value as another criterion, which reflects the conformity between the image edge and that of the vehicle candidate. The
standard deviation is defined as follows:

1

l2 =
|M|

(Dist(x, y) l1 )2

(14)

(x,y)M

Smaller mean and deviation values indicate that the candidate's edge
fits the actual image well, reflecting a higher probability that the
candidate is an actual vehicle.
Up to now, we have not considered occlusions yet. While occlusion happens so that part of the candidate is covered by preceding
vehicles, we limit the calculation of l1 and l2 only in the subset M
of M that is not covered.
Since we detect vehicles sequentially, and in front-view images a
vehicle will never be occluded by the following ones, we can always
determine M and M by our knowledge gained during the process.
When a vehicle is detected, we can mark the vehicle area using the
vehicle zone of the model (see Fig. 3), which is used to determine the
occluded and unoccluded zones for further detection, thus we are
able to evaluate the likelihood without preceding vehicles' influence.
Fig. 5 shows the occlusion condition and illustrates M and M .
Under occlusion conditions, especially when two sequential undetected vehicles are near, l1 and l2 will have multiple minima that
only differ slightly, see Fig. 6(b). In order to ensure that vehicles are
detected from front to rear, we count the unassigned edge points
in front of the candidate's position (x, y). An unassigned edge point
is defined as follows: (1) it is an edge point in the edge image E,
and (2) it is not covered by any vehicle zone of currently detected
vehicles, i.e., we have not assigned the edge point to any detected
vehicles. If there are too many unassigned edge points in front of the
candidate, it is likely that there may be an undetected vehicle. This
prevents our method from jumping over vehicles when detecting

318

Y. Jia, C. Zhang / Pattern Recognition 42 (2009) 313 -- 321

Fig. 6. Graphs of the likelihood function with respect to parameters x and y under different occasions. (a) denotes a single vehicle without occlusions; (b) denotes three
sequential vehicles with occlusions. The graph mainly shows the change along the y axis.

sequentially. Mathematically, the term is defined as follows:



Iu (x , y )
l3  +

(15)

(x ,y )E,y >y

where Iu (x , y ) = 1 if the edge point (x , y ) is not assigned to any


detected vehicles, and  is an offset value to prevent from too small
l3 values.
We assume that the occlusion of vehicles is limited to a certain
extent, i.e., a candidate with most of its zone occluded by preceding
vehicles is not likely to be an actual vehicle. Thus, we define a term
based on the unoccluded ratio of the candidate:

|M |
(16)
l4 1
|M|
Squared root is adopted in order to allow slight occlusions and place
comparatively high punishment on candidates with little parts unoccluded at the meantime.
Note that for a good candidate, l1 to l4 should be small. Thus, we
define the likelihood as

1
4

P(I|S)
li
(17)
i=1

Because the standard definition of likelihood requires the integration over candidate space to be 1, thus we use instead of = in the
equations above. However, in Section 4 we will only use the likelihood ratio, so it does not make difference whether to normalize the
function or not.
Fig. 6 shows two graphs of the likelihood function with respect
to the parameters x and y. Fig. 6(a) is a single vehicle without occlusions, and the likelihood function has a sharp peak. This peak
represents the position of an actual vehicle at the corresponding position. Fig. 6(b) shows the likelihood in the case of three sequential
vehicles with occlusions. The likelihood has several local maxima as
shown. Actually, the three peaks marked in the figure represent the
three sequential vehicles, respectively. Note that peaks 2 and 3 are
suppressed by the l3 term so that vehicle 1 can be detected first.
Due to the existence of the local maxima, simple algorithms such as
greedy search and Newton's method cannot guarantee to find the

global maxima. In the next section, we will design a Markov chain


to accomplish this goal.
4. Vehicle detection using MCMC
MCMC is a simulation algorithm providing a solution within a
reasonable time adaptable to many cases [2225]. It is used here to
find the MAP solution to our problem. The basic procedure of the
MCMC is to build up a Markov chain to sample proposals and to find
the MAP solution from a given distribution, in our case, the posterior
probability P(S|I) P(S)P(I|S). This is done as follows [22]: at the t-th
iteration, a new proposal S is sampled from the current candidate
St and a proposal distribution q(S |St ); then, the new proposal is accepted according to a transition probability A(St , S ). We will discuss
the proposal distribution q and the transition probability A, respectively, in the following paragraphs.
Given current candidate St = {xt , yt , mt , wt , ht }, we adopt a sampling method similar to the Gibbs sampling to raise efficiency: it
updates only one of the five parameters of the proposal at one time,
instead of changing all parameters simultaneously. While Gibbs sampling updates parameters in turn, we assign different weights to
each parameter, and select one to update at every stage according
to the weights. Generally, the positions x and y have higher weights
so they are chosen more often. The motivation of this idea is clear.
The width and height of the vehicle is largely depended on the position and prior knowledge and converges quickly, while the position
of the vehicle is much more undetermined and need more sampling
stages to find. This is achieved by assigning smaller weight values
to the width and height, and larger to the position parameter. These
weights are fixed according to prior knowledge, thus does not affect the stochastic characteristic of the stationary distribution of the
Markov chain.
For the model number, we simply sample the new value from
a uniform distribution q(m |mt ) = 1/N. For the parameters x, y, w, h
other than the model number m, we use a Gaussian distribution to
draw the new parameter as (taking x for example):
q(x |xt ) =

1
2x


exp

(x xt )2
22x


ID (x )

(18)

Y. Jia, C. Zhang / Pattern Recognition 42 (2009) 313 -- 321

where the standard deviation 2x is a constant value fixed according


to experience. ID (x ) is an index function to ensure that x is in the
domain of the prior probability (11), i.e., to ensure px (x ) > 0. This
constrains the new proposal to be feasible. As for the initial candidate
of the MCMC procedure, we use the candidate with the largest prior
probability as the initial proposal of the Markov chain to search for
the MAP solution. This initial proposal is easy to compute for it only
relies on the preceding vehicle and the analytical prior probability
measure.
Then, the Markov chain decides whether the Markov chain accepts the new state or not based on the Metropolis-Hastings method,
according to the transition probability:


P(S )P(I|S ) q(St |S )
A(St , S ) = min 1,
(19)

P(St )P(I|St ) q(S |St )
where P(S) and P(I|S) are the prior probability and likelihood discussed in Section 3. In another word, by drawing another random
value  U[0, 1], The new state St+1 is determined as
 
S if   A(St , S )
(20)
St+1 =
St if  > A(St , S )
Next we make discussion about the convergence of MCMC. The
dynamic we use to draw new proposal S can be generally divided
to shift process (change the model of the vehicle) and diffusion processes (move the position or scale the size according to the current
value). All the operations are stochastic, thus the Markov chain designed in this way is ergodic and aperiodic, i.e., the Markov chain can
travel between any state Si and state Sj in finite steps. This ensures
that the chain can reach the maximum point in the proposal space.
The Markov chain also has a stationary distribution that equals to
the posterior probability P(S|I).
In order to maximize the posterior probability and for computational time consideration, we use the Metropolis-Hastings algorithm
with simulated annealing to make the proposal converge to the actual parameters of the existing vehicle. To perform this, we change
the function A(St , S ) as follows:
 


P(S )P(I|S ) 1/T q(St |S )
A(St , S ) = min 1,
(21)
P(St )P(I|St )
q(S |St )
In this way, the Markov chain is modified to a non-homogeneous one
with its invariant distribution equal to P 1/Tt (S|I) at every iteration t.
The convergence of the simulated annealing algorithm is proved in
Ref. [22]. We change the temperature of the algorithm at iteration t
according to the equation
Tn =

T0
1 +  lg t

(22)

where  is a fixed constant. In our experiment, setting T0 = 2 and


 = 1 works well for most cases, although a further parameter tuning may enhance the performance. The algorithm starts with a high
temperature T0 to ensure the ability of jumping out from local maxima. When (1) the temperature falls to a threshold value Tthres , or
(2) the algorithm rejects nthres proposals consecutively, we stop the
algorithm and take the current proposal as the vehicle detected. During our experiments, the second stop criterion is more often met,
especially when detecting single vehicles.
After one proposal is finally determined as a vehicle, our algorithm steps forward for detecting following vehicles. The stepforward procedure consists of two parts. First, the detected vehicle
is marked in the target image, and according to the vehicle zone
information of the model, the corresponding image zone is marked
as detected. This is used for the evaluation of further likelihood
function. Second, the algorithm calculates a step-forward distance
to jump over a certain distance at the y coordinate. This distance is

319

Table 1
An algorithmic description of our method
1. Preprocessing the image, start searching from the bottom of the image.
2. Initialize vehicle candidate S1 for MCMC, let t = 1.
3. Perform MCMC to search for the vehicle. For each step t, do the following:
3.1. choose a parameter to update (x for example);
3.2. sample new proposal S using q(x |xt );
3.3. calculate the prior probability and likelihood of St , S ;
3.4. accept or reject the new proposal using (21);
3.5. go to 4 if stop criterion is met; else let t = t + 1 and go to 3.1.
4. Mark the detected vehicle and step forward.
5. If algorithm reaches the rear of the road, stop; else go to 2.
6. Output the final result.

determined by the length of the vehicle and a safety distance while


driving. Distances between vehicles on the road are supposed to be
larger than the jumped distance. This will accelerate the convergence
of our sample procedure.
To better illustrate the procedure of our method, we give an algorithmic description of our method in Table 1.
5. Experimental results
We performed our vehicle detection method to a number of images containing different situations. The images we tested are divided into three sets. The set ring4 contains images taken from a
four-lane road, under mild weather conditions where there are little shadow. The set shadow contains images taken from a narrow
three-lane road under strong shadows. Traffic in both of the two sets
have mid-level occlusions. The set misc contains miscellaneous images taken from various places mainly showing high-level vehicle
occlusions. Camera calibration has been performed in an independent procedure. To evaluate the performance accurately, the vehicle
models are built using images that are not in any of the test data set.
These images are taken at different places but with similar camera
angles. Thirty models are extracted from the images representing
cars, minivans and SUVs in different angles and appearances, respectively. The vehicle models extracted from the images are then used
on all the three data sets.
We define a correct detection to be such that the final proposal
corresponds with an actual vehicle and one vehicle is only detected
once (any repeated detections will be defined false detections). The
difference between the position of the proposal and the vehicle
should not exceed 20% of the vehicle size, otherwise it is considered
as a false positive (FP) detection. Figs. 79 show some experimental
results with both successful detections and failures.
The images presented in Fig. 7 are drawn from the set ring4.
They show that our method can detect single vehicles successfully,
and can address low-level or mid-level occlusions that occur frequently on the road. The number of vehicles merged in one blob
does not affect the result of our detection much, for example, the
five vehicles that appear one after another in the middle lane of
Fig. 7(b) are all segmented and found successfully. And the markings on the roads such as the white inter-lane lines do not affect the
detection although we did not perform background extraction. This
is because the markings generate only a small number of edges that
cannot change the value of likelihood function much, since we use
an average-based measurement to evaluate likelihood. Most of the
vehicles can be detected successfully. An exception in the image is
that in Fig. 7(a), the last car on the right lane (marked by an asterisk at the right-top) behind the minivan has not been detected. This
is partially because most of the vehicle is covered by the minivan,
leaving too few features of the vehicle to distinguish.
The image from the set shadow shows that the influence of vehicle
shadows can be successfully eliminated, for shadows only create a
small number of edges, though it might form a large fake foreground
in many region-based methods. Similarly, the inter-lane foreground

320

Y. Jia, C. Zhang / Pattern Recognition 42 (2009) 313 -- 321

Fig. 7. Experiment results from the ring4 set.

merge caused by shadows is not a problem in our method, too.


Images from the set misc show detection under heavy occlusions.
Under these cases, the final proposals' parameters may slightly differ
from its exact parameters, but does not affect the overall detection
result. Some of the vehicles in these images are covered by precedent
vehicles by about a half, yet can still be recognized. Generally, our
method can successfully detect vehicles given that approximately
50% of the vehicle can still be seen. In Fig. 9(b), the method fails to
detect the jeep at the back marked by an asterisk. The reason is that
the prior probability assigns a penalty to vehicles which are off from
the center of the lane, and that the taxi following it not far away
creates a relatively high peak of likelihood. Such problems may be
solved by further adjusting the probability evaluation.
For a typical 800600 image containing occlusions, our method's
speed is about 20 vehicles per second. This is a reasonable time
for detecting. Empirically, our method has about 15 of tolerance
between the angle of the model and the actual angle of the vehicle.
By modifying the road model to general quadrilaterals and adding
corresponding vehicle models under different views such as side
views (and further, free views), we can extend our method to achieve
detection for multiple views under the same framework proposed
in our paper.
A potential problem is that, such extension may enlarge the solution space of proposal S and affect the speed of the algorithm. This
may be a shortcoming of our method. More efficient ways may be
considered in the future work to boost up the speed of the algorithm.
Also, in our experiment, we mainly focused on three types of vehicles, namely cars, minivans, and SUVs, which are common car types
in urban areas. These vehicles have a comparatively small variance
in their sizes and appearances. However, there are also more types
of vehicles such as trucks and mechanical vehicles, which have much
larger sizes and more varied appearances. If we aim to take more vehicle types into consideration, the solution space may be enlarged,
making the MCMC more difficult to detect the vehicles. We plan to
explore and handle these conditions in our future work.
We also present numerical results for our method. Similar to
most pattern classification applications, we define the vehicles that
have been successfully detected as true positive (TP), the vehicles
that have not been detected as false negative (FN), vehicles that do
not exist but is claimed to be detected as FP. Then, we define the
sensitivity as the proportion of vehicles that have been detected, i.e.,
TP/(TP + FN), and the false discovery rate (FDR) as the proportion of
detections that are not actual vehicles, i.e., FP/(TP + FP). The general
sensitivity of our method is 92.46% and the FDR is 4.27%. Detailed
information of each set is listed in Table 2. Results from the three
data sets show that detection under occlusions (as in misc) has a

Fig. 8. Experiment result from the shadow set. Note that shadows do not affect
detection.

slightly lower rate than that under the condition with more single
vehicles, but both of them can reach a satisfying level of detection.
6. Discussion and conclusion
In this paper, we presented a novel model-based computer vision
method based on static images capable of working under front-view
conditions for vehicle detection. The method is flexible and can be
easily extended to other views such as side or rear view by modifying the road model to general quadrilaterals and augmenting the
vehicle models into the common appearance of vehicles at that view
angle, while the algorithm remains generally the same with the one
we implemented. Besides, our method uses comparatively less initial information compared with other methods; many preprocessing
such as background preprocessing and shadow elimination are not
required in our method. The method is robust under different light
conditions, except for some extreme lightening conditions that the
image is too bright or too dark to show any detectable edge. And it
can deal with inter-vehicle occlusions to a satisfying extent, which
has been manifested by the experimental results with real-world
data.
The method we designed can be further incorporated in other
tasks such as vehicle classification, since our method outputs vehicle
types and other information reasoning about the occluded part of
vehicles. It can also be used to help other methods to initiate vehicle
information even when vehicles are occluded. Successful detection

Y. Jia, C. Zhang / Pattern Recognition 42 (2009) 313 -- 321

321

Fig. 9. Experiment results from the misc set.

Table 2
Detection results on different sets
Set

Vehicles

Detections

TP

FP

Sensitivity (%)

FDR (%)

ring4
shadow
misc

163
39
501

159
38
482

154
37
459

5
1
23

94.48
94.87
91.62

3.14
2.63
4.77

Total

703

679

650

29

92.46

4.27

on single images will lower the complexity while we integrate these


detection results to perform tracking on a series of images or videos.
References
[1] D. Beymer, P. McLauchlan, B. Coifman, J. Malik, A real-time computer vision
system for measuring traffic parameters, Comput. Vision Pattern Recognition
(1997) 495501.
[2] O. Achler, M. Trivedi, Camera based vehicle detection, tracking, and wheel
baseline estimation approach, Intelligent Transportation Systems, 2004, in:
Proceedings of the 7th International IEEE Conference on 2004, pp. 743748.
[3] S. Gupte, O. Masoud, R.F.K. Martin, N. Papanikolopoulos, Detection and
classification of vehicles, IEEE Trans. Intell. Transp. Syst. 3 (1) (2002) 3747.
[4] D. Koller, J. Weber, J. Malik, Robust multiple car tracking with occlusion
reasoning, Eur. Conf. Comput. Vision 1 (1994) 189196.
[5] M. Haag, H. Nagel, Combination of edge element and optical flow estimates
for 3D-model-based vehicle tracking in traffic image sequences, Int. J. Comput.
Vision 35 (3) (1999) 295319.
[6] U. Handmann, T. Kalinke, C. Tzomakas, M. Werner, W. Seelen, An image
processing system for driver assistance, Image Vision Comput. 18 (5) (2000)
367376.
[7] S. Kamijo, Y. Matsushita, K. Ikeuchi, M. Sakauchi, Occlusion robust tracking
utilizing spatio-temporal Markov random field model, in: Proceedings of the
International Conference on Pattern Recognition, 2000.
[8] N. Kanhere, S. Pundlik, S. Birchfield, Vehicle segmentation and tracking from a
low-angle off-axis camera, in: IEEE Conference on Computer Vision and Pattern
Recognition, 2005, pp. 11521157.

[9] G. Sullivan, Model-based vision for traffic scenes using the ground-plane
constraint, Real-time Computer Vision, 1994.
[10] D. Roller, K. Daniilidis, H. Nagel, Model-based object tracking in monocular
image sequences of road traffic scenes, Int. J. Comput. Vision 10 (3) (1993)
257281.
[11] J. Wu, X. Zhang, J. Zhou, Vehicle detection in static road images with PCAand-wavelet-based classifier, in: Proceedings of the Intelligent Transportation
Systems, 2001, pp. 740744.
[12] J. Rojas, J. Crisman, Vehicle detection in color images, in: IEEE Conference on
Intelligent Transportation System, ITSC 97, 1997, pp. 403408.
[13] L. Tsai, J. Hsieh, K. Fan, Vehicle detection using normalized color and edge map,
in: IEEE International Conference on Image Processing, ICIP 2005, vol. 2, 2005.
[14] T. Zhao, R. Nevatia, Car detection in low resolution aerial images, Image Vision
Comput. 21 (8) (2003) 693703.
[15] M. Koch, K. Malone, A sequential vehicle classifier for infrared video using
multinomial pattern matching, in: Proceedings of the Conference on Computer
Vision and Pattern Recognition Workshop, June, 2006.
[16] O. Ozcanli, A. Tamrakar, B. Kimia, J. Mundy, Augmenting shape with appearance
in vehicle category recognition, in: IEEE Proceedings of the 2006 IEEE Computer
Society Conference on Computer Vision and Pattern Recognition (CVPR06),
2006.
[17] X. Song, R. Nevatia, A model-based vehicle segmentation method for tracking,
in: International Conference on Computer Vision (ICCV05), pp. 11241131.
[18] F. Oberti, S. Calcagno, M. Zara, C. Regazzoni, Robust tracking of humans and
vehicles in cluttered scenes with occlusions, in: Proceedings of the International
Conference on Image Processing, vol. 3, 2002.
[19] C. Pang, W. Lam, N. Yung, A novel method for resolving vehicle occlusion
in a monocular traffic-image sequence, in: IEEE Transactions on Intelligent
Transportation Systems, vol. 5(3), 2004, pp. 129141.
[20] J. Canny, A computational approach to edge detection, IEEE Trans. Pattern Anal.
Mach. Intell. 8 (6) (1986) 679698.
[21] D. Gerlough, M. Huber, Traffic flow theory, TRB Special Report N.165,
Transportation Research Board, Washington, DC.
[22] C. Andrieu, N. de Freitas, A. Doucet, M. Jordan, An introduction to MCMC for
machine learning, Mach. Learn. 50 (1) (2003) 543.
[23] Z. Tu, S. Zhu, Image segmentation by data-driven Markov chain Monte Carlo,
IEEE Trans. Pattern Anal. Mach. Intell. (2002) 657673.
[24] C. Wu, C. Liu, H. Shum, Y. Xu, Z. Zhang, Automatic eyeglasses removal from
face images, IEEE Trans. Pattern Anal. Mach. Intell. (2004) 322336.
[25] M. Lee, I. Cohen, A model-based approach for estimating human 3D poses in
static images, IEEE Trans. Pattern Anal. Mach. Intell. (2006) 905916.

About the AuthorYANGQING JIA received the B.S. degree in automation from Tsinghua University, Beijing, China in 2006. He is currently a M.S. candidate at the State Key
Laboratory of Intelligent Technology and Systems, Department of Automation, Tsinghua University. His research interests include machine learning, pattern recognition and
relative applications.
About the AuthorCHANGSHUI ZHANG received the B.S. degree in mathematics from Beijing University, Beijing, China, in 1986 and the Ph.D. degree in automation from
Tsinghua University, Beijing, in 1992. Since July 1992, he has been working as a Teacher at the Department of Automation, Tsinghua University. He is currently a professor at
the Department of Automation, Tsinghua University. His research interests include machine learning, pattern recognition, artificial intelligence, image processing, evolutionary
computation, etc.

Potrebbero piacerti anche