Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
of Self-driving Cars
Liang Wang Berthold K.P. Horn
CSAI Lab. MIT dept. EECS, MIT
Cambridge, MA 02139, USA Cambridge, MA 02139, USA
wangliang@csial.mit.edu bkph@csial.mit.edu
978-1-5386-2524-8/17/$31.00 ⃝
c 2017 IEEE
For stability, we must have |H(jω)| ≤ 1 for sinusoidal
excitation of any frequency ω (where s = jω). Note that
kd2 + kv2 ω 2
|H(jω)|2 = (4)
(kd − ω 2 )2 + (kv + T )2 ω 2
The stability condition |H(jω)| ≤ 1 corresponds then to kd2 +
kv2 ω 2 ≤ (kd − ω 2 )2 + (kv + T )2 ω 2 for all ω 2 < ∞. That is,
kd T 2 + 2kv T ≥ 2 (5)
For human drivers, the “reaction time” T is usually taken
to be about 1 second [1]. Without relative velocity control
(i.e. kv = 0), kd will be too large to implement (and would
lead to accelerations and decelerations quite unacceptable to
Fig. 2. We use the camera coordinate system, in which the Z axis is along
passengers). The only way of setting reasonable gains, i.e. the optical axis of the camera. A planner surface is perpendicular to the Z
considerably less than one, is to use non-zero kv . Further, for axis, and moving at the speed (U,V,W). The TTC is Z/(-W).
self-driving cars, T could be much smaller, e.g. 0.5 sec. In
that case, the coefficient for kv (i.e. 2T ), is much larger than
the coefficient for kd (i.e. T 2 ), so that stable operation can Z axis, and moving at the speed (U, V, W ). The position of
be achieved more esasily using a large gain on the relative the image (x, y) of the point (X, Y, Z) is determined by the
velocity term. In summary, the relative-velocity control term perspective projection equation [4], i.e.
is important for self-driving cars. x = f X/Z and y = f Y /Z (6)
The distance dn = xn−1 − xn − l can be measured directly
The motion (U, V, W ) and the depth Z will generate move-
using e.g Radar or Lidar. However, if the relative velocity
ments of the image pattern, which is called the optical flow
rn = vn−1 − vn is estimated by taking the derivative of
(u, v) [6]. The relationship between motion in the world and
dn directly, i.e. rn = d/dt(dn ), then the noise in dn will
motion of corresponding image points is given by [7]:
be amplified. If, for example, we model small components
of perturbations in the measurements as waves of the form u = (U f − xW )/Z and v = (V f − yW )/Z (7)
ϵ sin(ωt), then the derivative will be corrupted by ωϵ cos(ωt),
The problem of estimating motion (U, V, W ) and depth
and so higher frequency components of measurement noise
Z(x, y) from (u, v) is known as passive navigation in the
will be amplified a lot1 .We conclude that it is non-trivial to
machine vision field [7], [8], [9]. Even for the very simple
obtain a reliable measurement of relative velocity for use in
case shown in Fig. 2, there is no unique solution [4], [7].
control of self-driving cars.
Note that the time to contact (TTC) is Z/(−W ). Also,
III. T IME TO C ONTACT the focus of expansion (FOE) [10], denoted by (x0 , y0 ), is
As an object approaches you, its image on your retina will x0 = f U/W and y0 = f V /W . Eq. (7) can be written as:
expand; conversely, as it moves further away, the image will u = (x − x0 )C and v = (y − y0 )C (8)
become smaller and smaller. This observation gives us some
intuition into how one might estimate the motion of an object where C = 1/TTC. The three parameters (x0 , y0 ) and C can
from its time-varying image. As is well known, at least two be estimated from the given optical flow (u, v).
calibrated cameras are needed to obtain the depth of objects B. Optical flow
in the scene. Thus, the absolute motion of the object, i.e. The problem of estimating (u, v) from the changes in
the speed, can not be estimated using just a single camera. the image pattern — which is described by spatial variation
However, the ratio of the object’s depth and its speed, which (Ex , Ey ) and temporal variation Et of an image sequence
is known as time to contact (TTC), can be estimated using a E(x, y, t) — is known as the optical flow problem [6]. Here,
single camera [5]. Note that the distance dn can be obtained we face a very special case in which (u, v) is determined
reliably using some other sensor like Radar or Lidar. Thus, rn by only 3 parameters. Thus, it is more efficient to solve this
can be estimated well by dn C, where C is the 1/TTC. problem using the optical flow constraint directly, i.e.
A. Passive navigation uEx + vEy + Et = 0 (9)
We chose to use the camera coordinate system shown in Fig.
Substituting (8), we find
2. The origin is at the pin-hole of the camera, and the Z axis is
along the optical axis of the camera. The X and Y directions Ex A + Ey B + GC + Et = 0 (10)
in 3-D are aligned with the x and y axes of the image sensor where G = xEx + yEy , A = −x0 C and B = −y0 C. Note
[4]. Suppose there is a planar surface perpendicular to the that here Ex , Ey , Et and G are calculated from the image
1 We could attempt to limit this effect by approximate low pass filtering the sequence, while the parameters A, B, C are the unknowns to
result, but that would introduce latency or time delays. be determined.
C. Least-square solution
We can solve for A, B and C in (10) by a least squares
method. That is, we minimize the following objective function
∫∫
2
min (Ex A + Ey B + GC + Et ) dx dy (11)
A,B,C
3(a) are the downsampled E, Et , Ex , Ey , while the last two C k = (1 − α)k C0 + α(1 − α)l Ck−l (14)
show the pixels’ motion (in x and y direction, respectively) l=0
due to the phone’s angular velocity — which is measured by This implements an infinite impulse response (IIR) smoothing
the phone’s gyroscope. The three “bars” in Fig. 3 show the filter. Here, the weights decay exponentially with time. This
computed parameters A, B, C, which indicate the motion (of approach emphasizes new points over old ones, and does not
the camera) in the X, Y and Z directions. Red means positive, require keeping a complete history of old values.
while green means negative.
V. E XPERIMENTS
B. Two-registers recursive filtering We use a controllable robot car to test the TTC algorithm.
Note that the speed of both the current car and the leading Fig. 4 shows the experimental environment. The robot car is
car are continuous (velocity cannot change instantaneously; controlled to move at about 40 cm/sec. The screen of the smart
in fact, it’s rate of change is limited by the maximum ac- phone is recorded by Android Studio IDE. Fig. 5 shows some
celerations and decelerations possible for the vehicle). Thus, experimental results. Fig. 5(a) is one frame from the video of
(a) One frame in the recorded result
Fig. 4. The experimental environment. A smart phone is mounted on a
controllable robot. The result on the screen is recorded on a laptop computer.