Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Phương pháp này là 1 phương pháp nhận biên dạng, với ý tưởng ban đầu là 1
đối tượng trong ảnh có thể được nhìn thấy khỏi nền nhờ biên dạng của nó.
Đường biên dạng này lại được mô tả bởi hướng (orientation) và độ lớn
(intensity) của vector pháp tuyến
To calculate a HOG descriptor, we need to first calculate the horizontal and vertical
gradients; after all, we want to calculate the histogram of gradients. This is easily achieved
by filtering the image with the following kernels.
Let’s first focus on the pixel encircled in blue. It has an angle ( direction ) of 80 degrees and magnitude of
2. So it adds 2 to the 5th bin. The gradient at the pixel encircled using red has an angle of 10 degrees and
magnitude of 4. Since 10 degrees is half way between 0 and 20, the vote by the pixel splits evenly into the
two bins
To account for changes in illumination and contrast, we can normalize the gradient
values locally. This requires grouping the “cells” together into larger, connecting “blocks”.
It is common for these blocks to overlap, meaning that each cell contributes to the final
feature vector more than once.
Again, the number of blocks are rectangular; however, our units are no longer pixels —
they are the cells! Dalal and Triggs report that using either 2 x 2 or 3 x 3 cells_per_block
obtains reasonable accuracy in most cases.
Now consider another vector in which the elements are twice the
value of the first vector 2 x [ 128, 64, 32 ] = [ 256, 128, 64 ]. You can
work it out yourself to see that normalizing [ 256, 128, 64 ] will
result in [0.87, 0.43, 0.22]
11/6/2019 13
Idea of SIFT
Image content is transformed into local feature
coordinates that are invariant to translation, rotation,
scale, and other imaging parameters
SIFT Features
11/6/2019 14
Claimed Advantages of SIFT
11/6/2019 15
Scale Invariant Feature Transform
Basic idea:
• Take 16x16 square window around detected feature
• Compute edge orientation (angle of the gradient - 90) for each pixel
• Throw out weak edges (threshold gradient magnitude)
• Create histogram of surviving edge orientations
0 2
angle histogram
2.LoG Approximation The Laplacian of Gaussian is great for finding interesting points (or key
points) in an image. But it's computationally expensive. So we cheat and approximate it using the
representation created earlier.
3.Finding keypoints With the super fast approximation, we now try to find key points. These are
maxima and minima in the Difference of Gaussian image we calculate in step 2
4.Get rid of bad key points Edges and low contrast regions are bad keypoints. Eliminating
these makes the algorithm efficient and robust. A technique similar to the Harris Corner
Detector is used here.
5.Assigning an orientation to the keypoints An orientation is calculated for each key point.
Any further calculations are done relative to this orientation. This effectively cancels out the effect
of orientation, making it rotation invariant.
6.Generate SIFT features Finally, with scale and rotation invariance in place, one more
representation is generated. This helps uniquely identify features. Lets say you have 50,000
features. With this representation, you can easily identify the feature you're looking for (say, a
particular eye, or a sign board). That was an overview of the entire algorithm. Over the next few
days, I'll go through each step in detail. Finally, I'll show you how to implement SIFT in OpenCV!
Lowe’s Scale-space Interest Points
Laplacian of Gaussian kernel
Scale normalised (x by scale2)
Proposed by Lindeberg
Scale-space detection
Find local maxima across scale/space
A good “blob” detector
Hence
11/6/2019 20
Lowe’s Pyramid Scheme
• Scale space is separated into octaves:
• Octave 1 uses scale
• Octave 2 uses scale 2
• etc.
11/6/2019 22
Lowe’s Pyramid Scheme
s+2 filters
s+1=2(s+1)/s0
.
.
i=2i/s0
.
. s+3 s+2
2=22/s0 images differ-
1=21/s0 including ence
0 original images
The parameter s determines the number of images per octave.
11/6/2019 23
Difference-of-Gaussians
G k 2 * I
Gk * I D Gk G * I
G * I
11/6/2019 25
Key point localization s+2 difference images.
top and bottom ignored.
s planes searched.
Blur
Subtract
11/6/2019 26
11/6/2019 28
11/6/2019 29
Scale-space extrema detection: experimental results over 32 images
that were synthetically transformed and noise added.
% detected average no. detected
% correctly matched
Stability Expense
Sampling in scale for efficiency
How many scales should be used per octave? S=?
More scales evaluated, more keypoints found
S < 3, stable keypoints increased too
S > 3, stable keypoints decreased
S = 3, maximum stable keypoints found
11/6/2019 30
Keypoint Localization & Filtering
11/6/2019 32
Eliminating the Edge Response
Reject flats:
< 0.03
Reject edges:
Let be the eigenvalue with
larger magnitude and the smaller.
11/6/2019 33
11/6/2019 34
11/6/2019 35
2. Accurate keypoint localization
• Reject points with low contrast (flat) and
poorly localized along an edge (edge)
• Fit a 3D quadratic function for sub-pixel
maxima
6
5
-1 0 +1
2. Accurate keypoint localization
• Reject points with low contrast (flat) and
poorly localized along an edge (edge)
• Fit a 3D quadratic function for sub-pixel
maxima 6
1 f ' ' (0)
3 f ( x) f (0) f ' (0) x x2
2
6 6 2
5 f ( x) 6 2 x x 6 2 x 3x 2
2
1
f ' ( x) 2 6 x 0 xˆ
3
2
1 1 1 1
f ( xˆ ) 6 2 3 6
3 3 3
-1 0 1 +1
3
2. Accurate keypoint localization
• Taylor series of several variables
• Two variables
f f 1 2 f 2 2 f 2 f 2
f ( x, y ) f (0,0) x y x 2 xy y
x y 2 xx xy yy
2 f 2 f
x 0 f f x 1 x
x x x y
f f x y 2
f f y
2
y 0 x y y 2
xy yy
T
f 1 f 2
f x f 0 x xT x
x 2 x 2
Accurate keypoint localization
• Taylor expansion in a matrix form, x is a vector,
f maps x to a scalar
Hessian matrix
(often symmetric)
f 2 f 2 f 2 f
gradient
x1 x 1
2
x1x2 x1xn
f 2 f 2 f 2 f
x
1 x2 x1 x22 x2 xn
f 2 f 2 f
2 f
x x x
n n 1 xn x2 xn2
2D illustration
2D example
-17 -1 -1
-9 7 7
-9 7 7
Derivation of matrix form
h
h( x) g x
T
x
Derivation of matrix form
h( x) g T x h
x1 g
x1 1
h
g1 g n g
x x h
n x g n
n
n
g i xi
i 1
Derivation of matrix form
h(x) x Ax
T
h
x
Derivation of matrix form
a11 a1n x1
h(x) x Ax x1
T
xn
n n a a x
aij xi x j n1 nn n
i 1 j 1
h n n
ai1 xi a1 j x j
x1 i 1 j 1
h
A T x Ax
x h n n
x ain xi a nj x j
( A T
A) x
n i 1 j 1
Derivation of matrix form
2 2 T
f f 1 f f f f
2
2 x 2 x
x x 2 x 2
x x x
Accurate keypoint localization
• x is a 3-vector
• Change sample point if offset is larger than 0.5
• Throw out low contrast (<0.03)
Accurate keypoint localization
• Throw out low contrast | D(xˆ ) | 0.03
T
D 1 T 2D
D(xˆ ) D xˆ xˆ xˆ
x 2 x 2
T
1 D D 2 D 2 D D
T 1 1
D 2
D xˆ 2 2
x
2 x 2
x x x x
T T T 1
D 1 D D 2
D D D
2 2
D xˆ
x 2 x x 2 x 2 x 2 x
T T 1
D 1 D D D2
D xˆ
x 2 x x 2 x
T T
D 1 D
D xˆ (xˆ )
x 2 x
T
1 D
D xˆ
2 x
Maxima in D
Remove low contrast and edges
Keypoint Orientation assignment
The idea is to collect gradient directions and magnitudes around each keypoint. Then
we figure out the most prominent orientation(s) in that region. And we assign this
orientation(s) to the keypoint.
The size of the "orientation collection region" around the keypoint depends on it's
scale. The bigger the scale, the bigger the collection region.
Orientation assignment
Orientation assignment
Orientation assignment
Orientation Assignment
Create histogram of
local gradient directions
at selected scale
Assign canonical
orientation at peak of
smoothed histogram
Each key specifies
stable 2D coordinates
(x, y, scale,orientation)
If 2 major orientations, use both.
11/6/2019 58
Keypoint localization with orientation
832
233x189
initial keypoints
536
729
keypoints after keypoints after
gradient threshold ratio threshold
11/6/2019 59
4. Keypoint Descriptors
11/6/2019 60
Normalization
11/6/2019 61
SIFT Descriptor
16x16 Gradient window is taken. Partitioned into 4x4 subwindows.
Histogram of 4x4 samples in 8 directions
Gaussian weighting around center( is 0.5 times that of the scale of
a keypoint)
4x4x8 = 128 dimensional feature vector
11/6/2019 64
Using SIFT for Matching “Objects”
11/6/2019 65
11/6/2019 66
Uses for SIFT
x x
=1.2
det(Happrox) = DxxDyy - (0.9Dxy)2
9x9
15x15
21x21
Robotic & Vision Lab RoVis
Changing scale (within an octave)
9 15 21 27
=1.2 =1.2 * (27/9) = 3.6
9 15 21 27 33 39 45 51 57 63 69 75 81 87 93 99
5.2 ≤ ≤ 11.6
2.8 ≤ ≤ 6.0
=1.2 * (12/9) = 1.6
=1.2 * (24/9) = 3.2
=1.2 * (21/9) = 2.8
=1.2 * (45/9) = 6.0
=1.2 * (39/9) = 5.2
1.6 ≤ ≤ 3.2 =1.2 * (87/9) = 11.6
Robotic & Vision Lab RoVis
But higher octaves increasingly less
useful
10s
20s
weight by Gaussian
= 3.3s
20s
12
24
weight by Gaussian
=4
sign of LoG =
sign of trace(Happrox) =
24 sgn( Dxx + Dyy )
• speeds up matching
Robotic
SURF, 5 octaves, 4 scales per octave & Vision Lab RoVis
Examples: Tillman
Robotic
U-SURF, 1 octave, 4 scales per octave & Vision Lab RoVis
Examples: Tillman