Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
www.elsevier.com/locate/patrec
a,*
Fondazione Bruno Kessler-irst, Via Sommarive 18, I-38050 Povo, Trento, Italy
b
Universita` degli Studi di Trento, Via Mesiano 77, Trento, Italy
Received 1 August 2005; received in revised form 2 February 2007
Available online 13 May 2007
Communicated by H.H.S. Ip
Abstract
We present a feature-based classier that distinguishes bicycles from motorcycles in real-world trac scenes. The algorithm extracts
some visual features focusing on the wheel regions of the vehicles. It splits the problem into two sub-cases depending on the computed
motion direction. The classication is performed by non-linear Support Vector Machines. Tests lead to a successful vehicle classication
rate of 96.7% on video sequences taken from dierent road junctions in an urban environment.
2007 Elsevier B.V. All rights reserved.
Keywords: Trac monitoring; Feature extraction; Support Vector Machine; Vehicle classication; Image analysis
1. Introduction
Image analysis techniques have been shown to be eective and cost competitive in various trac control applications (Kastrinaki et al., 2003; Foresti et al., 2003; Hu et al.,
2004). In spite of some drawbacks, mainly related to a
dependence on scene illumination, vision-based systems
oer several advantages over traditional trac control
techniques: low impact on the road infrastructure, low
maintenance costs and the possibility for a remote operator
to receive images. Furthermore, a vision-based system can
be adapted to detect and classify particular kinds of vehicles on the basis of visual features. This is the case when
discriminating between bicycles and motorcycles. This
capability provides important information to trac managers in order to evaluate the need to build bicycle lanes
or to establish correlations between trac and air or acoustic pollution.
When necessary, bicycle counting is usually performed
manually by transportation personnel, or automatically
*
Corresponding author. Tel.: +39 0461 314 592; fax: +39 0461 314 591.
E-mail address: messelod@itc.it (S. Messelodi).
0167-8655/$ - see front matter 2007 Elsevier B.V. All rights reserved.
doi:10.1016/j.patrec.2007.04.014
by means of special purpose-build equipment. For temporary sessions of data acquisition pneumatic rubber tube
detectors placed across the road are often used. For continuous monitoring, permanent detectors are used, i.e. devices
such as loop detectors and infrared or video detection systems. A comparison of dierent bicycle detection technologies is included in (SRF Consulting Group, 2003).
Few vision-based algorithms have been proposed in literature (Dukesherer and Smith, 2001; Rogers and Papanikolopoulos, 2000) devoted to bicycle counting. The
algorithm proposed by Rogers and Papanikolopoulos
(2000) detects objects moving through the scene by means
of a background dierencing technique. The estimation of
the movement direction enables their system to localize the
wheels by searching for ellipses using the generalized
Hough transform in the edge map. They claim they are able
to count the number of bicycles on a trail with an accuracy
up to 70%, for a variety of weather conditions. Furthermore, the authors refer to a previous method (Rogers
and Papanikolopoulos, 1999) where bicycles were detected
in the image by a template-matching technique, although
concluding that the Hough based method is a better alternative, mainly for computational reasons. Dukesherer and
1720
frames, i.e. regions containing edges or corners. The correspondences among these regions between two successive
images provide information about the object movement.
The result of this module is a set of objects each one represented by a data structure that stores information about
the dierent views of the object in the scene and its displacements D between consecutive frames.
2.2. Parameter extraction
The parameter extraction module analyzes the output of
the previous module in order to provide a description in
terms of class, speed and path for each detected and
tracked object. For the purpose of this paper only the classication step needs a brief description. It works in two
stages: a model-based classication step, followed by a feature-based one, when needed.
The model-based classier makes use of a set of 3D
models which provide a rough description of the shapes
of dierent vehicle categories. There are eight adopted
models: a single model (called CYCLE) represents motorcycles and bicycles, three models for cars, two models for
vans, and a single model represents lorries and buses. Actually, a model is dened to represent pedestrians but it is
used only to detect false alarms.
The 3D model classier considers for each view of the
moving object the best match with each 3D model placed
in dierent positions and along dierent directions on the
ground plane. The match score is computed as the overlap
between the support set Bj of the jth view of the object and
the projection of the ith model onto the image plane. The
object is then assigned to the 3D model having the highest
average score computed over the set of its views. Focusing
on the best 3D model, a list is associated to each object
view containing the following data: the estimated position
and orientation on the ground plane, the overlap score,
and a number in the interval (0, 1] (inside factor) that species what fraction of the projected model is visible in the
image.
If the best model corresponds to a single vehicle category, then the classication terminates straightforwardly.
Otherwise, specialized classiers are applied in order to
determine the correct class among the vehicle categories
associated to that 3D model. These classiers use specic
features extracted from the views of the vehicle.
3. Feature selection
At the end of the model-based classication phase, an
object descriptor stores the following information for each
view of the vehicle:
B: the support set of the unknown vehicle, i.e. a binary
mask corresponding to the convex hull of the vehicle in
the image;
I: the subimage of the input image corresponding to the
bounding box of B;
1721
D: the absolute dierence map between I and the background image in the same location;
the displacement vector D in the image plane of the vehicle blob with respect to the previous frame;
(x0, y0, h): the estimated position and direction of the
vehicle on the road;
the score of the model-based classication step;
the inside factor, i.e. the fraction of the real-world positioned model that is visible in the image.
An example of this information is reported in Fig. 1.
The world coordinate system is chosen by having the
(X, Y) plane in correspondence to the road plane, the origin
at the vertical projection of the camera optical center to
(X, Y), and the Y-axis directed as the projection of the optical axis onto the road plane.
In order to deal with the great variability of the visual
appearance of a motor/bicycle, mainly due to the dierent
perspectives under which it can be observed by the camera,
we choose to distinguish two dierent contexts according
to the moving direction of the vehicle, h, with respect to
the Y-axis. If h is close to the Y-axis direction, i.e. the angle
between them is less than a xed value Th, a front or rear
view of the moving vehicle appears in the image (Fig. 2a
and b). Otherwise, the image depicts a side view of the vehicle (Fig. 2c and d). The selection of the threshold values
will be discussed at the end of this section.
Side view. In this case, the underlying idea of the algorithm is that the luminance of the region inside the wheels
of a bicycle is more similar to the background, with respect
to the same region of a motorcycle. The purpose is to localize the wheel regions in the image and compute the average
value of D for those regions. The feature extraction proceeds as follows:
(1) Let x be the direction of the displacement vector D in
the image plane.
(2) Compute the direction x0 of the minimum bounding
rectangle1 (MBR) of the support set B with direction
constrained in a centered neighborhood (5) of x.
(3) Estimate the location of the regions R1, R2 corresponding to the wheels:
compute the projections p0 and p1 of the subimage
of D in B, respectively along the direction x0 and
its normal;
in order to reduce boundary noise (mainly due to
shadow) in both the directions, in particular in
the wheels area, consider the portion p00 of p0
between the 3rd and 100th percentile and the portion p01 of p1 between the 1st and 99th percentile;
the wheel regions are approximated by two rectangular regions R1 and R2 (Fig. 3), obtained from
the intersection of the backprojection of the rst
1
MBR(x,d)(B) is the rectangle with minimum area which contains all the
points of the set B and has a side with slope in the range (x d, x + d).
1722
Fig. 1. Information associated to a view of a vehicle labeled as CYCLE by the model-based classier: region I of the input image, background dierence map
D, support set B (here its boundary overlapped to I). Other information: D = (12, 23), (x0, y0, h) = (1504, 11,741, 50), model-based classication
score = 0.85, inside factor = 1.0.
Fig. 2. (a) Front view of a bicycle with estimated motion direction h = 200 in the real world. (b) Rear view of a motorcycle: h = 20. (c) Side view of a
bicycle: h = 90. (d) Side view of a motorcycle: h = 100.
bicycles, and this fact can be detected by analyzing a prole obtained from the image portion that contains a wheel
of the vehicle (actually the wheel which is closest to the camera). The feature extraction algorithm works as follows:
(1) Taking advantage of the information about position
and direction of the vehicle on the road plane, and
the expected displacement of wheels with respect to
the vehicle middle point, the real-world location of
the wheel closest to the camera is estimated. To focus
the analysis to the bottom part of the wheel, a vertical
segment of xed height (40 cm in the experiments) is
virtually placed in that location and its back projection in the image plane is computed. Let Hw be its
length, in pixels, on the image plane.
1723
Fig. 3. Side views. The projections p0 and p1 of the dierence map, along the direction x0 and its normal, in order to determine the wheel regions R1 and
R2 (boxes). Left (bicycle): x0 = 85.3; the extracted features lead to S1 = 24.3 and S2 = 37.4. Right (motorcycle): x0 = 89.7; the extracted features lead to
S1 = 79.8 and S2 = 75.6.
The values of the thresholds involved in this rst classication algorithm (both for side and front/rear view) have
been set during a parameters estimation stage, on a training
set of labeled vehicle views. Only the views having score
and inside factor greater than prexed values (0.3 and
0.95, respectively) are considered. A range of reasonable
values have been assigned to Th, Ts, Tp, Twp and the conguration that maximized the classication rate on a training set of labeled vehicle views has been selected. The
percentile values, used to remove noise at the tails of the
projections, have been estimated by comparing, for a small
set of images, the projections of the automatically extracted
blobs and those of manually labeled blobs.
The classication performance of this algorithm, presented in Section 5, suggests that the considered visual cues
have a suciently good discrimination power. Therefore
they are adopted in the selection of the features of the
SVM classiers described in the following section.
4. The SVM bicycle/motorcycle classier
Support Vector Machines are based on the learning
theory developed by Vapnik (1995). They are a method
1724
(P0 and P1). Their computation is dierent in the two contexts, and consists in the following steps (refer to Figs. 6
and 7):
(1) let x be the direction of the displacement vector D in
the image plane;
(2) let x0 be the slope of MBR(x,5)(B); let V0, V1, V2, V3
be the vertexes of the MBR, counterclockwise, where
the side V 0 V 1 is the lower side in image coordinates
having slope x0 for side views, and slope orthogonal
to x0 for front/rear views;
(3) extract two rectangular zones Zi, i = 1, 2 from the
MBR whose vertexes V0, V1, V 02 , V 03 are dened parametrically with respect to factors fi, as V 02 V 1
fi V 2 V 1 and V 03 V 0 fi V 3 V 0 ;
(4) compute the projections p0 by projecting the region of
D enclosed in Z1 along the direction x0, and p1 by
projecting the region of D enclosed in Z2 along the
direction normal to x0;
(5) P0 and P1 are then obtained by the quantization of p0
and p1 into xed dimensions D0 and D1.
Fig. 6. Computation of projection p0 and p1 for a side view of a motorcycle. The MBR and the zones Z1 and Z2 are highlighted.
Fig. 7. Computation of projection p0 and p1 for a rear view of a bicycle. The MBR and the zones Z1 and Z2 are highlighted.
1725
Table 1
Classication result at vehicle level of the early classier applied to three
sequences coming from two dierent junctions
Input class
Classication result
No.
Bicycle
Motorcycle
Bicycle
Motorcycle
45
144
27
5
18
139
40.0
3.5
Total
189
12.2
Side views
Front views
Total views
Total vehicles
Bicycle
Motorcycle
Total
324
150
474
78
675
107
782
197
999
257
1256
275
Table 3
Error rates of the SVM classiers for side views and for front views, at
view level and at vehicle level, distinguishing on both the classes
Global
error (%)
Bicycle
error (%)
Motorcycle
error (%)
6.2
6.2
6.3
3.3
10.2
6.0
9.5
3.8
4.3
6.5
4.3
3.0
1726
The major source of classication errors is the inaccurate detection of the vehicle boundary in the previous localization and tracking steps. This is mainly due to the partial
or total inclusion of the vehicle shadow in the blob, or to
the inclusion of moving background regions, typically generated by moving leaves or their shadows.
6. Conclusions
In this paper, we have described an algorithm for the
discrimination between bicycles and motorcycles. It is part
of a video-based trac monitoring system that aims to
detect, track and classify vehicles at urban road junctions.
The algorithm is applied after a model-based classication, that is unable to discriminate between the two vehicle
classes, but that ensures (with a certain condence) that the
vehicle belongs to one of the two considered classes. The
visual features used by the classier are computed starting
from the vehicle image, the background image and an estimated position and orientation of the vehicle in the real
world. This data is provided by other modules of the monitoring system. The algorithm focuses on the image regions
that correspond to the wheels of the vehicle, and acts dierently depending on the vehicle orientation with respect to
the camera view (side or front/rear).
The application of a rough classier has shown that the
selected zones and features are discriminant. Support Vector Machines have then been trained using analogous features, based on the skewed projection proles of the lower
part of the vehicle, leading to a global error rate of 6.3% at
view level and 3.3% at vehicle level. These gures cannot be
directly compared to other works as it is a relatively unexplored task. In fact, as far as we know, currently, no other
monitoring system for urban junctions exists that is able to
classify vehicles including the bicycle class. Considering
that in trac surveillance applications, aiming at collecting
data for statistical purposes, a classication error rate
around 5% is typically accepted, our results can be considered satisfactory.