Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Geometry
Subhashis Banerjee
Dept. Computer Science and Engineering
IIT Delhi
email: suban@cse.iitd.ac.in
May 29, 2001
1
1 Camera Models
Each point is scaled by its individual depth, and all projection rays converge to the
optic center.
1.3 The Affine Camera
The affine camera is a special case of the projective camera and is obtained by con-
straining the matrix T such that T31 = T32 = T33 = 0, thereby reducing the degrees
of freedom from 11 to 8:
X1
x1 T11 T12 T13 T14
X2
x2 = T21 T22 T23 T24
X3
x3 0 0 0 T34
X4
In terms of image and scene coordinates, the mapping takes the form
x = MX + t
where M is a general 2 × 3 matrix with elements Mij = Tij /T34 while t is a general
2-vector representing the image center.
The affine camera preserves parallelism.
showing that a small focal length (f ), small field of view (X/Zave and (Y /Zave ) and
small depth variation (∆Z) contribute to the validity of the model.
Z Average depth
ave
‘‘plane’’
f Image ‘‘plane’’
X
Optic Center Xp X X
wp orth
Figure 1: 1D image formation with image plane at Z = f . Xp , Xwp and Xorth are the
perspective, weak-perspective and orthographic projections respectively.
where p0 = (x0 , y 0 , 1)T and p = (x, y, 1) are homogeneous 3-vectors representing cor-
responding image points in two views.
(See Shapiro, Zisserman and Brady).
To derive the above, we write M as (B | b) where B is a general (non-singular)
2 × 2 matrix and b is a 2 vector. The projection equation then gives
" #
Xi
xi = B + Zi b + t
Yi
Similarly, for M0 A, we have
" #
Xi
x0i =B 0
+ Zi b0 + M 0 D + t 0
Yi
Eliminating scene coordinates (Xi , Yi ) gives
x0i = Γxi + Zi d + ²
where Γ = B0 B−1 , d = b0 − B0 B−1 b and ² = t0 − Γt + M0 D.
X1
X1
X2 X2
X3
X3
u’3
x1 x1
u’3 u’2
x2 x2 x3
u’1
u’2
x3
x’e u’1
xe
O O’
(a) (b)
Γ and d are functions only of camera parameters {M, M0 } and the motion trans-
formation A, while ² explains the motion of the reference point (centroid) and depend
on the translation of the object D and the camera origins t and t0 .
This equation shows that x0i associated with xi lies on a line (epipolar) on the
second image with offset Γxi + ² and direction d. The unknown depth Zi determines
how far along this line does x0i lie. Inverting the equation we obtain
We can eliminate Zi from the above equations and obtain a single equation in terms
of image measurables:
(x0i − Γxi − ²).d⊥ = 0
where, d = (dx , dy ) and its perpendicular d⊥ = (dy , −dx ). This equation can be
written as
ax0i + byi0 + cxi + dyi + e = 0
where (a, b)T = d, (c, d)T = −ΓT d⊥ and e = −²T d⊥ . This gives us
h 0 0 a
i xi
x0i yi0 1
0 0 b
yi = 0
c d e 1
2.1.1 Computation of Affine epipolar Geometry
Given correspondences in two views the affine fundamental matrix can be computed
using orthogonal regression by minimizing
1 n−1X
2 (ri · n + e)2
| n | i=0
Here ri = (x0i , yi0 , xi , yi )T and n = (a, b, c, d)T . The minimization finds a hyper-plane
that globally minimizes the sum of the squared perpendicular distances between ri
and the hyper-plane.
Defining
vi = ri − r̄
and
n−1
X
W= vi vi T
i=0
Wn = λi n, | n |2 = 1
• Affine projections
If the affine camera models for the two views are given by the parameters {M, t}
and {M0 , t0 } respectively, then
Xi − X0 = αi E1 + βi E2 + γi E3 for i = 1, . . . , n
These correspondences establish the bases {e1 , e2 , e3 } and {e01 , e02 , e03 } provided
no two axes, in either images, are collinear. Each additional point gives four
equations in 3 unknowns
" # " # αi
∆xi e1 e2 e3
= βi
∆x0i e01 e02 e03
γi
and the affine structure can be computed. The redundancy in the system enables
us to verify whether the affine projection model is valid.
2.2.1 Tomasi and Kanade factorization
In case n point correspondences (n ≥ 4) over k views (k ≥ 2) are available, we use
the factorization procedure of Tomasi and Kanade to obtain the bases and structure.
Their formulation can be written as an extension of the above equation as
∆x1 ∆x2 . . . ∆xn−1 e1 e2 e3
∆x01 ∆x02 . . . ∆x0n−1
e01 e02 e03 1
α α2 . . . αn−1
= β1 β2 . . . βn−1
∆x001 ∆x002 . . . ∆x00n−1
e001 e002 e003
γ
.. .. 1 γ 2 . . . γ n−1
. .
∆x00 = G00 ∆X
Thus, if images of an object are obtained using affine cameras, then a novel view can
be expressed as a linear combination of views (this is useful for object recognition).
~
Q reference
plane
~
P
p’
p
q
~
p’
q’
V
1
V
~
2 q’
In the second image, the third axis vector is no longer degenerate, given by e0k =
ME0k = MAEk . e0k is actually an epipolar line. If we use e01 and e02 to predict the
position where each point would appear in image 2, as if they lay on plane {E1 , E2 },
we get
x̂0i = x00 + αi e01 + βi e02
the disparity between the predicted position and the observed position
• Metric constructions
2.3.2 Procedure
3. Since the axis of rotation is known in both views, one can find the overall scale
difference due to translation in depth. Points on the axis of rotation do not
rotate. Consider the projection of all image points on to this axis. If they differ
in the two views, they must differ by only a constant scale factor. Otherwise,
the rigidity assumption is falsified.
4. Now the two views differ only by a rotation about an axis in the fronto-parallel
plane. Define a Euclidean frame (e1 , e2 , e3 ), such that e1,2,3 are unit vectors
with e1 along the axis of rotation and e3 along the line of sight.
Let G1 e1 + G2 e2 denote the depth gradient of a plane in the object. That is,
the depth of a point αe1 + βe2 in the image with respect to the fronto-parallel
plane is αG1 + βG2 . Note that
G1 = tan σ cos τ
G2 = tan σ sin τ
X3 = G1 X1 + G2 X2
Y3 = G1 Y1 + G2 Y2
Of the three transformed coordinates, the first one is trivially unchanged and
the third one is not observable. The second coordinate is observable, and the
equations are:
here the upper indices label the views and the lower indices label the compo-
nents.
Because the turn ρ is unknown, we eliminate it from these equations to obtain a
single equation in (G1 , G2 ). This equation represents a one-parameter solution
for the two view case. The parameter is the unknown turn ρ. The equation is
quadratic in (G1 , G2 ) with the linear term absent; and represents a hyperbola
in the (G1 , G2 ) space (please derive it).
5. Repeating the steps above between the second and a third view, we obtain a
pair of two view solutions. Each two view solution represents a one-parameter
family of solutions. The one-parameter families for the 0-1 transition and the
1-2 transition are represented by the hyperbolic loci in the gradient space. The
pair of hyperbola has either two or four intersections. The case of no intersection
occurs only in the non-rigid case. If the motion is rigid, then there has to be
one solution and hence a pair of them. The intersections represent either one or
two pairs of solutions that are related through a reflection in the fronto-parallel
plane.