Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Pattern Recognition
journal homepage: w w w . e l s e v i e r . c o m / l o c a t e / p r
A R T I C L E
I N F O
Article history:
Received 29 January 2007
Received in revised form 25 June 2008
Accepted 29 July 2008
Keywords:
Vehicle detection
Bayesian method
Maximizing a posteriori
Markov chain Monte Carlo
A B S T R A C T
In this paper, we propose a new vehicle detection approach based on Markov chain Monte Carlo (MCMC).
We mainly discuss the detection of vehicles in front-view static images with frequent occlusions. Models
of roads and vehicles based on edge information are presented, the Bayesian problem's formulations are
constructed, and a Markov chain is designed to sample proposals to detect vehicles. Using the Monte
Carlo technique, we detect vehicles sequentially based on the idea of maximizing a posterior probability (MAP), performing vehicle segmentation in the meantime. Our method does not require complex
preprocessing steps such as background extraction or shadow elimination, which are required in many
existing methods. Experimental results show that the method has a high detection rate on vehicles and
can perform successful segmentation, and reduce the influence caused by vehicle occlusion.
2008 Elsevier Ltd. All rights reserved.
1. Introduction
Collecting and generating vehicle information from either videos
or static images is a basic task of intelligent transportation systems. In
the recent years, computer vision-based methods have been widely
applied to solve the problem. Compared with traditional methods
such as using electromagnetic coils embedded underground, these
new methods have the advantage of less cost, more accurate detections, and higher flexibility. The cameras can be installed on poles,
bridges or buildings near the road, and can be used for surveillance
in proper operation for a long time with low upkeep once installed
and calibrated.
Vehicle detection has received much attention over the recent
years. Beymer et al. [1] divided the methods of vehicle detection into
four classes as 3D model-based, regional-based, active contour-based
and feature-based, each of which has its advantages and disadvantages as described in Ref. [1]. Achler et al. [2], Gupte et al. [3], and
Koller et al. [4] used an estimated background to extract moving objects and updated the background for real-time luminance changes.
However, this method requires an initial background with no foreground objects, and updating the background information occupies
a considerable time in the whole process of detection. Haag et al. [5]
used edge information and set up an optical flow model to detect
vehicles. Handmann et al. [6] performed the detection using a neural
network, based on local-oriented coding and entropy analysis.
314
Fig. 1. A sample input image containing multiple lanes and vehicles (the lanes are
sketched out by trapeziums manually).
2. Model construction
The first step in establishing our method is to construct models
to describe the road and vehicles. There are various features that can
be used to detect vehicles. In our method, we choose the edge information in the image based on the fact that vehicle zones have a
number of edges while road zones have few edges, and that edge information are less affected by environment changes compared with
other features such as vehicle colors and image gray values. We construct the mathematical definition of road lanes and vehicle models.
2.1. Road model
The road is formed of lanes that are processed separately in our
algorithm. We assume that there is little distortion in the image, or
it can be eliminated by camera calibration by a preprocessing algorithm. Thus every lane in the image can be modeled as a trapezium
as shown in Fig. 1. Assuming that the lane is planar, we can map the
image point (u, v) to the actual position on the road (x, y) = Map(u, v)
with a simplified perspective transform:
(tu, tv, t)T = A33 (x, y, 1)T
(1)
(2)
In the front-view condition, x is the distance across the road, indicating the extent to which the vehicle is off the center of the lane.
y is the distance along the road, such as the distance between one
vehicle and the following one. The distance is further used to calculate the prior probability in Section 3.
315
Fig. 2. The edge extraction and edge-distance transform. (a) is the original image; (b) is the edge extracted; (c) represents edge distance value. The gray value of the points
(black = 0 and white = 255) is proportional to the edge distance value.
Fig. 3. Four sample vehicle models. The 1st and 4th column is the original image of the vehicle model; the 2nd and 5th column are the vehicle edge; the 3rd and 6th
column are the vehicle zone. All the images are normalized to 50 pixels.
10 angles for each type of vehicle. Fig. 3 shows four examples from
the library.
A vehicle model in the library includes the following information:
(1) Model number. It is used as an index.
(2) Vehicle edge. This is the edge of the vehicle. To keep conformity,
it is calculated from the image of the vehicle using Canny edge
extraction algorithm described above.
(3) Vehicle zone. It defines the image zone that belongs to the vehicle. The third column of Fig. 3 shows some examples. After finding a certain vehicle fitting the model, the corresponding zone
can be determined by mapping the white points of the zone image to the actual image. The zone of the model is defined by
manual marking.
(4)
The vehicle edge and vehicle zone feature are both represented as
a binary matrix, and the vehicle model is equivalent to a rectangle
bounding the vehicle.
Additionally, we expect the resolution of the image to be such
that the size of the vehicles is at least 50 pixels wide. This is to ensure
that we have enough pixels to calculate the probability function. In
practice, an 800 600 image meeting the requirements above can
contain about four lanes and can cover a length of about 80 m. This
is a reasonable size that meets most of practical requirements. Thus,
we normalize the width of every vehicle model as 50 pixels, and the
height varies with different vehicle types.
316
parameters:
S = {x, y, m, w, h}
(5)
S2
Si1
Si
Sk
So we have
P(S1 , . . . , Sk |I) =
k
(6)
i=1
(7)
(8)
We will discuss the calculation of the two functions P(S) and P(I|S),
respectively, in the following part.
(9)
((y L))(c1)
(c 1)!
where c = 1, 2, . . .
Fig. 4. The distribution of probability along the y direction after the first vehicle
(framed) is detected. The left graph is the typical Erlang distribution without offset,
and the right graph shows our data-driven method to automatically calculate offset
to improve the evaluation of probability.
(10)
2
1 exp x
,
px (x) x
22x
(11)
otherwise
317
Fig. 5. The mapping procedure of the vehicle model. The left one shows the case of a single vehicle. Note that it represents one step during the searching progress and has
not yet converged to the MAP position. The middle one shows the case of detecting an occluded vehicle with the precedent vehicle (the one marked dark) detected. The
mapped edge point set M can then be divided into the uncovered part M and the covered part M M shown in the right graph. The model has been placed at its MAP
position, i.e., after the searching phase, it is just the detection result.
The prior probability of vehicle model m can be determined according to the frequency of every kind of vehicle model. In the experiment of this article, we simply set every model's prior probability
to be the same, i.e., pm (m) = 1/N.
The width and the height of the candidate in the image are dependent on the candidate's position, due to the projection effect.
Generally, shown in the image, vehicles at the front may have larger
width and height values while those at the rear may have smaller
values. Assume that the width of the lane at the candidate's position is W, we define the expectation of the vehicle width as Kw W
where Kw is a parameter determined by prior knowledge, the candidate's position value (x, y), and the model m. Similar to x, we define
the prior probability of w as a clipped Gaussian distribution with
variance 2w :
2
1 exp (w Kw W ) , K W w K W
l
h
pw (w|x, y, m) w
22w
0
otherwise
(12)
where Kl and Kh is the lower bound and higher bound ratio, respectively. When the width and the vehicle model of the candidate are
defined, we can simply calculate the height of the candidate by the
model's heightwidth ratio. However, to improve robustness, we allow the height to vary at a small range, too. Thus the prior probability
of the height ph (h|x, y, m, w) can be defined similar to Eq. (12).
3.2. Likelihood P(I|S)
Likelihood evaluates the matching of a candidate and actual vehicles. We design the likelihood function based on the criteria that: (1)
the function reaches its maximum with the parameter corresponding to the position of an actual vehicle; (2) the peak of the function is
as sharp as possible. In the meantime, the function should consider
and eliminate the difficulty brought up by occlusions.
We start calculating the likelihood by first mapping the selected
vehicle model of the candidate to the corresponding image area
according to the given parameters as shown in Fig. 5. The width
and height of the original standard-sized model are resized, and are
placed according to the coordinate of the candidate. We define the
mapped edge points of the model (shown as in Fig. 5) as set M.
We use four criteria to form up the likelihood function: the arithmetic mean (l1 ) and the standard deviation (l2 ) of the edge distance,
the uncounted edge in the image (l3 ), and the ratio of the candidate not being covered by preceding vehicles (l4 ). The four parts are
discussed in the following paragraphs.
1
|M|
Dist(x, y)
(13)
(x,y)M
where |M| is the number of points in M and Dist() denotes the edgedistance value. The arithmetic mean of the edge distance might lose
its accuracy on describing likelihood when part of the mapped area
contains dense edges. Thus, we consider the standard deviation of
the distance value as another criterion, which reflects the conformity between the image edge and that of the vehicle candidate. The
standard deviation is defined as follows:
1
l2 =
|M|
(Dist(x, y) l1 )2
(14)
(x,y)M
Smaller mean and deviation values indicate that the candidate's edge
fits the actual image well, reflecting a higher probability that the
candidate is an actual vehicle.
Up to now, we have not considered occlusions yet. While occlusion happens so that part of the candidate is covered by preceding
vehicles, we limit the calculation of l1 and l2 only in the subset M
of M that is not covered.
Since we detect vehicles sequentially, and in front-view images a
vehicle will never be occluded by the following ones, we can always
determine M and M by our knowledge gained during the process.
When a vehicle is detected, we can mark the vehicle area using the
vehicle zone of the model (see Fig. 3), which is used to determine the
occluded and unoccluded zones for further detection, thus we are
able to evaluate the likelihood without preceding vehicles' influence.
Fig. 5 shows the occlusion condition and illustrates M and M .
Under occlusion conditions, especially when two sequential undetected vehicles are near, l1 and l2 will have multiple minima that
only differ slightly, see Fig. 6(b). In order to ensure that vehicles are
detected from front to rear, we count the unassigned edge points
in front of the candidate's position (x, y). An unassigned edge point
is defined as follows: (1) it is an edge point in the edge image E,
and (2) it is not covered by any vehicle zone of currently detected
vehicles, i.e., we have not assigned the edge point to any detected
vehicles. If there are too many unassigned edge points in front of the
candidate, it is likely that there may be an undetected vehicle. This
prevents our method from jumping over vehicles when detecting
318
Fig. 6. Graphs of the likelihood function with respect to parameters x and y under different occasions. (a) denotes a single vehicle without occlusions; (b) denotes three
sequential vehicles with occlusions. The graph mainly shows the change along the y axis.
(15)
1
4
P(I|S)
li
(17)
i=1
Because the standard definition of likelihood requires the integration over candidate space to be 1, thus we use instead of = in the
equations above. However, in Section 4 we will only use the likelihood ratio, so it does not make difference whether to normalize the
function or not.
Fig. 6 shows two graphs of the likelihood function with respect
to the parameters x and y. Fig. 6(a) is a single vehicle without occlusions, and the likelihood function has a sharp peak. This peak
represents the position of an actual vehicle at the corresponding position. Fig. 6(b) shows the likelihood in the case of three sequential
vehicles with occlusions. The likelihood has several local maxima as
shown. Actually, the three peaks marked in the figure represent the
three sequential vehicles, respectively. Note that peaks 2 and 3 are
suppressed by the l3 term so that vehicle 1 can be detected first.
Due to the existence of the local maxima, simple algorithms such as
greedy search and Newton's method cannot guarantee to find the
1
2x
exp
(x xt )2
22x
ID (x )
(18)
T0
1 + lg t
(22)
319
Table 1
An algorithmic description of our method
1. Preprocessing the image, start searching from the bottom of the image.
2. Initialize vehicle candidate S1 for MCMC, let t = 1.
3. Perform MCMC to search for the vehicle. For each step t, do the following:
3.1. choose a parameter to update (x for example);
3.2. sample new proposal S using q(x |xt );
3.3. calculate the prior probability and likelihood of St , S ;
3.4. accept or reject the new proposal using (21);
3.5. go to 4 if stop criterion is met; else let t = t + 1 and go to 3.1.
4. Mark the detected vehicle and step forward.
5. If algorithm reaches the rear of the road, stop; else go to 2.
6. Output the final result.
320
Fig. 8. Experiment result from the shadow set. Note that shadows do not affect
detection.
slightly lower rate than that under the condition with more single
vehicles, but both of them can reach a satisfying level of detection.
6. Discussion and conclusion
In this paper, we presented a novel model-based computer vision
method based on static images capable of working under front-view
conditions for vehicle detection. The method is flexible and can be
easily extended to other views such as side or rear view by modifying the road model to general quadrilaterals and augmenting the
vehicle models into the common appearance of vehicles at that view
angle, while the algorithm remains generally the same with the one
we implemented. Besides, our method uses comparatively less initial information compared with other methods; many preprocessing
such as background preprocessing and shadow elimination are not
required in our method. The method is robust under different light
conditions, except for some extreme lightening conditions that the
image is too bright or too dark to show any detectable edge. And it
can deal with inter-vehicle occlusions to a satisfying extent, which
has been manifested by the experimental results with real-world
data.
The method we designed can be further incorporated in other
tasks such as vehicle classification, since our method outputs vehicle
types and other information reasoning about the occluded part of
vehicles. It can also be used to help other methods to initiate vehicle
information even when vehicles are occluded. Successful detection
321
Table 2
Detection results on different sets
Set
Vehicles
Detections
TP
FP
Sensitivity (%)
FDR (%)
ring4
shadow
misc
163
39
501
159
38
482
154
37
459
5
1
23
94.48
94.87
91.62
3.14
2.63
4.77
Total
703
679
650
29
92.46
4.27
[9] G. Sullivan, Model-based vision for traffic scenes using the ground-plane
constraint, Real-time Computer Vision, 1994.
[10] D. Roller, K. Daniilidis, H. Nagel, Model-based object tracking in monocular
image sequences of road traffic scenes, Int. J. Comput. Vision 10 (3) (1993)
257281.
[11] J. Wu, X. Zhang, J. Zhou, Vehicle detection in static road images with PCAand-wavelet-based classifier, in: Proceedings of the Intelligent Transportation
Systems, 2001, pp. 740744.
[12] J. Rojas, J. Crisman, Vehicle detection in color images, in: IEEE Conference on
Intelligent Transportation System, ITSC 97, 1997, pp. 403408.
[13] L. Tsai, J. Hsieh, K. Fan, Vehicle detection using normalized color and edge map,
in: IEEE International Conference on Image Processing, ICIP 2005, vol. 2, 2005.
[14] T. Zhao, R. Nevatia, Car detection in low resolution aerial images, Image Vision
Comput. 21 (8) (2003) 693703.
[15] M. Koch, K. Malone, A sequential vehicle classifier for infrared video using
multinomial pattern matching, in: Proceedings of the Conference on Computer
Vision and Pattern Recognition Workshop, June, 2006.
[16] O. Ozcanli, A. Tamrakar, B. Kimia, J. Mundy, Augmenting shape with appearance
in vehicle category recognition, in: IEEE Proceedings of the 2006 IEEE Computer
Society Conference on Computer Vision and Pattern Recognition (CVPR06),
2006.
[17] X. Song, R. Nevatia, A model-based vehicle segmentation method for tracking,
in: International Conference on Computer Vision (ICCV05), pp. 11241131.
[18] F. Oberti, S. Calcagno, M. Zara, C. Regazzoni, Robust tracking of humans and
vehicles in cluttered scenes with occlusions, in: Proceedings of the International
Conference on Image Processing, vol. 3, 2002.
[19] C. Pang, W. Lam, N. Yung, A novel method for resolving vehicle occlusion
in a monocular traffic-image sequence, in: IEEE Transactions on Intelligent
Transportation Systems, vol. 5(3), 2004, pp. 129141.
[20] J. Canny, A computational approach to edge detection, IEEE Trans. Pattern Anal.
Mach. Intell. 8 (6) (1986) 679698.
[21] D. Gerlough, M. Huber, Traffic flow theory, TRB Special Report N.165,
Transportation Research Board, Washington, DC.
[22] C. Andrieu, N. de Freitas, A. Doucet, M. Jordan, An introduction to MCMC for
machine learning, Mach. Learn. 50 (1) (2003) 543.
[23] Z. Tu, S. Zhu, Image segmentation by data-driven Markov chain Monte Carlo,
IEEE Trans. Pattern Anal. Mach. Intell. (2002) 657673.
[24] C. Wu, C. Liu, H. Shum, Y. Xu, Z. Zhang, Automatic eyeglasses removal from
face images, IEEE Trans. Pattern Anal. Mach. Intell. (2004) 322336.
[25] M. Lee, I. Cohen, A model-based approach for estimating human 3D poses in
static images, IEEE Trans. Pattern Anal. Mach. Intell. (2006) 905916.
About the AuthorYANGQING JIA received the B.S. degree in automation from Tsinghua University, Beijing, China in 2006. He is currently a M.S. candidate at the State Key
Laboratory of Intelligent Technology and Systems, Department of Automation, Tsinghua University. His research interests include machine learning, pattern recognition and
relative applications.
About the AuthorCHANGSHUI ZHANG received the B.S. degree in mathematics from Beijing University, Beijing, China, in 1986 and the Ph.D. degree in automation from
Tsinghua University, Beijing, in 1992. Since July 1992, he has been working as a Teacher at the Department of Automation, Tsinghua University. He is currently a professor at
the Department of Automation, Tsinghua University. His research interests include machine learning, pattern recognition, artificial intelligence, image processing, evolutionary
computation, etc.