Sei sulla pagina 1di 96

Histogram of Gradient

Robotic & Vision Lab RoVis


HOG là 1 phương pháp mô tả đặc trưng dữ liệu ảnh được sử dụng phổ biến
trong machine learning và image processing nhằm mục đích phân loại đối tượng

Phương pháp này là 1 phương pháp nhận biên dạng, với ý tưởng ban đầu là 1
đối tượng trong ảnh có thể được nhìn thấy khỏi nền nhờ biên dạng của nó.
Đường biên dạng này lại được mô tả bởi hướng (orientation) và độ lớn
(intensity) của vector pháp tuyến

Robotic & Vision Lab RoVis


Step 1- Resize

Robotic & Vision Lab RoVis


Step 2- Gradient

To calculate a HOG descriptor, we need to first calculate the horizontal and vertical
gradients; after all, we want to calculate the histogram of gradients. This is easily achieved
by filtering the image with the following kernels.

Robotic & Vision Lab RoVis


A “cell” is a rectangular region defined by the number of pixels that belong in each cell. For
example, if we had a 128 x 128 image and defined our pixels_per_cell as 4 x 4, we would
thus have 32 x 32 = 1024 cells:

Robotic & Vision Lab RoVis


If we defined our pixels_per_cell as 32 x 32, we would have 4 x 4 = 16 total cells:

Robotic & Vision Lab RoVis


Step 3 : Calculate Histogram of Gradients in 8×8 cells

Robotic & Vision Lab RoVis


The next step is to create a histogram of gradients in these 8×8 cells -> 64 gradient. The histogram
contains 9 bins corresponding to angles 0, 20, 40 … 160.

Let’s first focus on the pixel encircled in blue. It has an angle ( direction ) of 80 degrees and magnitude of
2. So it adds 2 to the 5th bin. The gradient at the pixel encircled using red has an angle of 10 degrees and
magnitude of 4. Since 10 degrees is half way between 0 and 20, the vote by the pixel splits evenly into the
two bins

Robotic & Vision Lab RoVis


If the angle is greater than 160 degrees, it is between 160 and 180, and we know the angle
wraps around making 0 and 180 equivalent. So in the example below, the pixel with angle
165 degrees contributes proportionally to the 0 degree bin and the 160 degree bin.

Robotic & Vision Lab RoVis


Robotic & Vision Lab RoVis
Step 4: Contrast normalization over blocks

To account for changes in illumination and contrast, we can normalize the gradient
values locally. This requires grouping the “cells” together into larger, connecting “blocks”.
It is common for these blocks to overlap, meaning that each cell contributes to the final
feature vector more than once.

Again, the number of blocks are rectangular; however, our units are no longer pixels —
they are the cells! Dalal and Triggs report that using either 2 x 2 or 3 x 3 cells_per_block
obtains reasonable accuracy in most cases.

RGB color vector [ 128, 64, 32 ]. The length of this vector is .


This is also called the L2 norm of the vector. Dividing each element of
this vector by 146.64 gives us a normalized vector [0.87, 0.43, 0.22].

Now consider another vector in which the elements are twice the
value of the first vector 2 x [ 128, 64, 32 ] = [ 256, 128, 64 ]. You can
work it out yourself to see that normalizing [ 256, 128, 64 ] will
result in [0.87, 0.43, 0.22]

Robotic & Vision Lab RoVis


Step 5 : Calculate the HOG feature vector

To calculate the final feature vector for the entire image


patch, the 36×1 vectors are concatenated into one giant
vector. What is the size of this vector ? Let us calculate

1.How many positions of the 16×16 blocks do we have ?


There are 7 horizontal and 15 vertical positions making a
total of 7 x 15 = 105 positions.

2.Each 16×16 block is represented by a 36×1 vector. So


when we concatenate them all into one gaint vector we
obtain a 36×105 = 3780 dimensional vector.

Fig. 5. Demonstration of a HOG histogram for


one block.

Robotic & Vision Lab RoVis


SIFT: Motivation

 The Harris operator is not invariant to scale and


correlation is not invariant to rotation1.

 For better image matching, Lowe’s goal was to


develop an interest operator that is invariant to scale
and rotation.

 Also, Lowe aimed to create a descriptor that was


robust to the variations corresponding to typical
viewing conditions. The descriptor is the most-used
part of SIFT.
1But Schmid and Mohr developed a rotation invariant descriptor for it in 1997.

11/6/2019 13
Idea of SIFT
 Image content is transformed into local feature
coordinates that are invariant to translation, rotation,
scale, and other imaging parameters

SIFT Features

11/6/2019 14
Claimed Advantages of SIFT

 Locality: features are local, so robust to occlusion


and clutter (no prior segmentation)
 Distinctiveness: individual features can be
matched to a large database of objects
 Quantity: many features can be generated for even
small objects
 Efficiency: close to real-time performance
 Extensibility: can easily be extended to wide range
of differing feature types, with each adding
robustness

11/6/2019 15
Scale Invariant Feature Transform
Basic idea:
• Take 16x16 square window around detected feature
• Compute edge orientation (angle of the gradient - 90) for each pixel
• Throw out weak edges (threshold gradient magnitude)
• Create histogram of surviving edge orientations

0 2
angle histogram

Adapted from slide by David Lowe


SIFT descriptor
Full version
• Divide the 16x16 window into a 4x4 grid of cells (2x2 case shown below)
• Compute an orientation histogram for each cell
• 16 cells * 8 orientations = 128 dimensional descriptor

Adapted from slide by David Lowe


SIFT Algorithm Overview
1.Constructing a scale space This is the initial preparation. You create internal representations
of the original image to ensure scale invariance. This is done by generating a "scale space".

2.LoG Approximation The Laplacian of Gaussian is great for finding interesting points (or key
points) in an image. But it's computationally expensive. So we cheat and approximate it using the
representation created earlier.

3.Finding keypoints With the super fast approximation, we now try to find key points. These are
maxima and minima in the Difference of Gaussian image we calculate in step 2

4.Get rid of bad key points Edges and low contrast regions are bad keypoints. Eliminating
these makes the algorithm efficient and robust. A technique similar to the Harris Corner
Detector is used here.

5.Assigning an orientation to the keypoints An orientation is calculated for each key point.
Any further calculations are done relative to this orientation. This effectively cancels out the effect
of orientation, making it rotation invariant.

6.Generate SIFT features Finally, with scale and rotation invariance in place, one more
representation is generated. This helps uniquely identify features. Lets say you have 50,000
features. With this representation, you can easily identify the feature you're looking for (say, a
particular eye, or a sign board). That was an overview of the entire algorithm. Over the next few
days, I'll go through each step in detail. Finally, I'll show you how to implement SIFT in OpenCV!
Lowe’s Scale-space Interest Points
 Laplacian of Gaussian kernel
 Scale normalised (x by scale2)
 Proposed by Lindeberg
 Scale-space detection
 Find local maxima across scale/space
 A good “blob” detector

11/6/2019 [ T. Lindeberg IJCV 1998 ] 19


Lowe’s Scale-space Interest Points:
Difference of Gaussians
 Gaussian is an ad hoc
solution of heat
diffusion equation

 Hence

 k is not necessarily very


small in practice

11/6/2019 20
Lowe’s Pyramid Scheme
• Scale space is separated into octaves:
• Octave 1 uses scale 
• Octave 2 uses scale 2
• etc.

• In each octave, the initial image is repeatedly convolved


with Gaussians to produce a set of scale space images.

• Adjacent Gaussians are subtracted to produce the DOG

• After each octave, the Gaussian image is down-sampled


by a factor of 2 to produce an image ¼ the size to start
the next level.
11/6/2019 21
You take the original image, and generate
progressively blurred out images. Then, you
resize the original image to half size. And you
generate blurred out images again. And you
keep repeating.

11/6/2019 22
Lowe’s Pyramid Scheme

s+2 filters
s+1=2(s+1)/s0

.
.
i=2i/s0
.
. s+3 s+2
2=22/s0 images differ-
1=21/s0 including ence
0 original images
The parameter s determines the number of images per octave.
11/6/2019 23
Difference-of-Gaussians

G k 2 * I
Gk * I D   Gk   G * I

G * I
11/6/2019 25
Key point localization s+2 difference images.
top and bottom ignored.
s planes searched.

 Detect maxima and


minima of difference-of-
Gaussian in scale space
Resam
ple

Blur

Subtract

 Each point is compared


to its 8 neighbors in the For each max or min found,
current image and 9 output is the location and
neighbors each in the the scale.
scales above and below

11/6/2019 26
11/6/2019 28
11/6/2019 29
Scale-space extrema detection: experimental results over 32 images
that were synthetically transformed and noise added.
% detected average no. detected

% correctly matched

average no. matched

Stability Expense
 Sampling in scale for efficiency
 How many scales should be used per octave? S=?
 More scales evaluated, more keypoints found
 S < 3, stable keypoints increased too
 S > 3, stable keypoints decreased
 S = 3, maximum stable keypoints found

11/6/2019 30
Keypoint Localization & Filtering

 Now we have much less points than pixels.


 However, still lots of points (~1000s)…
 With only pixel-accuracy at best
 And this includes many bad points

Brown & Lowe 2002


Keypoint localization

 Once a keypoint candidate is found, perform a


detailed fit to nearby data to determine
 location, scale, and ratio of principal curvatures
 In initial work keypoints were found at location and
scale of a central sample point.
 In newer work, they fit a 3D quadratic function to
improve interpolation accuracy.
 The Hessian matrix was used to eliminate edge
responses.

11/6/2019 32
Eliminating the Edge Response

 Reject flats:
 < 0.03
 Reject edges:
Let  be the eigenvalue with
larger magnitude and  the smaller.

Let r = /. (r+1)2/r is at a


So  = r min when the
2 eigenvalues
r < 10 are equal.
 What does this look like?

11/6/2019 33
11/6/2019 34
11/6/2019 35
2. Accurate keypoint localization
• Reject points with low contrast (flat) and
poorly localized along an edge (edge)
• Fit a 3D quadratic function for sub-pixel
maxima
6
5

-1 0 +1
2. Accurate keypoint localization
• Reject points with low contrast (flat) and
poorly localized along an edge (edge)
• Fit a 3D quadratic function for sub-pixel
maxima 6
1 f ' ' (0)
3 f ( x)  f (0)  f ' (0) x  x2
2
6 6 2
5 f ( x)  6  2 x  x  6  2 x  3x 2
2

1
f ' ( x)  2  6 x  0 xˆ 
3
2
1 1 1 1
f ( xˆ )  6  2   3     6
3  3 3
-1 0 1 +1
3
2. Accurate keypoint localization
• Taylor series of several variables

• Two variables
 f f  1  2 f 2 2 f 2 f 2 
f ( x, y )  f (0,0)   x  y    x 2 xy  y 
 x y  2  xx xy yy 
 2 f 2 f 
  x   0   f f   x  1   x
x x x y
f      f           x y  2  
 f  f  y
2
  y   0   x y   y  2
 xy yy 
T  
f 1  f 2
f x   f 0  x  xT x
x 2 x 2
Accurate keypoint localization
• Taylor expansion in a matrix form, x is a vector,
f maps x to a scalar

Hessian matrix
(often symmetric)
 f   2 f 2 f 2 f 
gradient     
 x1   x 1
2
x1x2 x1xn 
 f   2 f 2 f 2 f 
 x    
 1  x2 x1 x22 x2 xn 
        
 f   2 f 2 f

2 f 
 x   x x
 n  n 1 xn x2 xn2 
2D illustration
2D example

-17 -1 -1

-9 7 7

-9 7 7
Derivation of matrix form

h
h( x)  g x
T

x
Derivation of matrix form

h( x)  g T x  h 
 x1    g 
   x1   1 
h
  g1  g n        g
x  x  h   
 n  x   g n
n
 n
  g i xi
i 1
Derivation of matrix form

h(x)  x Ax
T

h

x
Derivation of matrix form

 a11  a1n  x1 
  
h(x)  x Ax  x1
T
 xn       
n n  a  a  x 
  aij xi x j  n1 nn  n 
i 1 j 1

 h   n n

    ai1 xi   a1 j x j 
 x1   i 1 j 1 
h
      A T x  Ax
x  h   n n 
 x    ain xi   a nj x j 
  ( A T
 A) x
 n   i 1 j 1

Derivation of matrix form

 2 2 T 
f f 1   f  f  f  f
2
   2 x  2 x
x x 2  x 2
x  x x
Accurate keypoint localization

• x is a 3-vector
• Change sample point if offset is larger than 0.5
• Throw out low contrast (<0.03)
Accurate keypoint localization
• Throw out low contrast | D(xˆ ) | 0.03
T
D 1 T 2D
D(xˆ )  D  xˆ  xˆ xˆ
x 2 x 2

T
1   D D   2 D   2 D D 
T 1 1
D 2
 D xˆ   2  2
x 
2  x  2 
x  x  x x 
T T T 1
D 1 D  D 2
 D  D D
2 2
 D xˆ 
x 2 x x 2 x 2 x 2 x
T T 1
D 1 D  D D2
 D xˆ 
x 2 x x 2 x
T T
D 1 D
 D xˆ  (xˆ )
x 2 x
T
1 D
 D xˆ
2 x
Maxima in D
Remove low contrast and edges
Keypoint Orientation assignment
The idea is to collect gradient directions and magnitudes around each keypoint. Then
we figure out the most prominent orientation(s) in that region. And we assign this
orientation(s) to the keypoint.

The size of the "orientation collection region" around the keypoint depends on it's
scale. The bigger the scale, the bigger the collection region.
Orientation assignment
Orientation assignment
Orientation assignment
Orientation Assignment

 Any peak within 80% of the highest peak is


used to create a keypoint with that orientation
 ~15% assigned multiplied orientations, but
contribute significantly to the stability
SIFT descriptor
Keypoint Orientation assignment

 Create histogram of
local gradient directions
at selected scale
 Assign canonical
orientation at peak of
smoothed histogram
 Each key specifies
stable 2D coordinates
(x, y, scale,orientation)
If 2 major orientations, use both.

11/6/2019 58
Keypoint localization with orientation

832
233x189
initial keypoints

536
729
keypoints after keypoints after
gradient threshold ratio threshold

11/6/2019 59
4. Keypoint Descriptors

 At this point, each keypoint has


 location
 scale
 orientation
 Next is to compute a descriptor for the local
image region about each keypoint that is
 highly distinctive
 invariant as possible to variations such as
changes in viewpoint and illumination

11/6/2019 60
Normalization

 Rotate the window to standard orientation

 Scale the window size based on the scale at


which the point was found.

11/6/2019 61
SIFT Descriptor
 16x16 Gradient window is taken. Partitioned into 4x4 subwindows.
 Histogram of 4x4 samples in 8 directions
 Gaussian weighting around center(  is 0.5 times that of the scale of
a keypoint)
 4x4x8 = 128 dimensional feature vector

Image from: Jonas Hurrelmann


Lowe’s Keypoint Descriptor
(shown with 4 X 4 descriptors over 16 X 16)

• In experiments, 4x4 arrays of 8 bin histogram is used, a total of 128


features for one keypoint
• Once you have all 128 numbers, you normalize them (just like you would normalize
a vector in school, divide by root of sum of squares). These 128 numbers form the
"feature vector". This keypoint is uniquely identified by this feature vector
11/6/2019 63
Lowe’s Keypoint Descriptor

 use the normalized region about the keypoint


 compute gradient magnitude and orientation at each
point in the region
 weight them by a Gaussian window overlaid on the
circle
 create an orientation histogram over the 4 X 4
subregions of the window
 4 X 4 descriptors over 16 X 16 sample array were
used in practice. 4 X 4 times 8 directions gives a
vector of 128 values. ...

11/6/2019 64
Using SIFT for Matching “Objects”

11/6/2019 65
11/6/2019 66
Uses for SIFT

 Feature points are used also for:


 Image alignment (homography, fundamental
matrix)
 3D reconstruction (e.g. Photo Tourism)
 Motion tracking
 Object recognition
 Indexing and database retrieval
 Robot navigation
 … many others

11/6/2019 [ Photo Tourism: Snavely et al. SIGGRAPH 2006 ] 67


SURF detectors and descriptors

Robotic & Vision Lab RoVis


Interest operator

• Interest point detector finds distinctive locations in


image (corners, blobs, T-junctions)
• should be repeatable
• Interest point descriptor used for matching
• should be distinctive and robust

Both should be fast

Robotic & Vision Lab RoVis


Harris features
• Recall Harris features?
(Eigenvalues of gradient covariance matrix)

• Dependent upon scale

Robotic & Vision Lab RoVis


Scale space

• Successively smooth an image...

Robotic & Vision Lab RoVis


ftp://ftp.nada.kth.se/CVAP/reports/Lin08-EncCompSci.pdf
This forms a 3D space, with scale as
the 3rd axis

Robotic & Vision Lab RoVis


http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=AF10AB3864DB87B4414F8169
Let’s look at a slice

 

x x

Signal F Contours of Fxx=0

Robotic & Vision Lab RoVis


http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.81.118&rep=rep1&type=pdf
SIFT / SURF

• SIFT – scale invariant feature transform (Lowe


2004)
• SURF – speeded up robust features (Bay et al. 2006)

Robotic & Vision Lab RoVis


SURF algorithm

Interest point detector:


• Compute integral image
• Apply 2nd derivative (approximate) filters to
image
• Non-maximal suppression
(Find local maxima in (x,y,) space)
• Quadratic interpolation

Robotic & Vision Lab RoVis


SURF algorithm

Interest point descriptor:


• Divide window into 4x4
(16 subwindows)
• Compute Haar wavelet outputs
• Within each subwindow, compute

• This yields a 64-element descriptor

(Only implement USURF – no rotation)


Robotic & Vision Lab RoVis
Integral image
• Integral image
(a.k.a. summed area table)
is a 2D running sum
• S(x,y) = SS I(x,y)
• To compute,
S(x,y) = I(x,y)
- S(x-1,y-1) + S(x-1,y) + S(x,y-1)
• To use,
V(l,t,r,b) = S(l,t) + S(r,b) - S(l,b) - S(r,t)
Returns sum of values inside rectangle
• Note: Sum of values in any rectangle
can be computed in constant time!
Robotic & Vision Lab RoVis
2 nd derivative filters
(9x9 filters)

2I Dyy Dxy


xy
=1.2 scale s=1.2  2G ( x;  )
x 2

=1.2
det(Happrox) = DxxDyy - (0.9Dxy)2

Robotic & Vision Lab RoVis


Changing scale

• Integral image allows us to upsample the filter


rather than downsample the image

9x9
15x15

21x21
Robotic & Vision Lab RoVis
Changing scale (within an octave)

• For 9x9 filter, l0 = 3


(length of positive or negative lobe in direction of derivative)
• To keep central pixel,
must increase l0 by l0=3
minimum of 2 pixels
 increase filter dimension by 6
9x9
• Therefore, sizes of filter:
9x9, 15x15, 21x21, 27x27
l0=5

Vision Lab RoVis


Robotic &15x15
Non-maximal suppression

• Retain pixel only if greater than 26 neighbors in x, y, s

Robotic & Vision Lab RoVis


Interpolation

• We now have values at 9, 15, 21, 27:

9 15 21 27
=1.2 =1.2 * (27/9) = 3.6

Range is halfway b/w


12 samples 24
=1.2 * (12/9) = 1.6 =1.2 * (24/9) = 3.2

Range is exactly one octave!


Robotic & Vision Lab RoVis
Interpolation

• For each local maximum, need to interpolate to get true


location (to overcome discretization effects)
• Hessian values:
• Taylor expansion:

• Solution using Newton’s method:

Robotic & Vision Lab RoVis


The next octaves

• First octave filter sizes: 9, 15, 21, 27


• Second octave sizes: 15, 27, 39, 51
• Increase by 12 each time (not 6)
• Spans from 21 (=1.2*21/9=2.8)
to 45 (=1.2*45/9=6)
(some overlap with first octave)
• Ok to measure at every other pixel in image
(saves computation, like downsampling)
• Third octave sizes:
27, 51, 75, 99
• Increase by 24
each time
• Spans from
39 (=1.2*39/9=5.2)
to 87 (=1.2*87/9=11.6)
• Ok to measure at every
4th pixel in image

Robotic & Vision Lab RoVis


SURF octave overview

9 15 21 27 33 39 45 51 57 63 69 75 81 87 93 99

5.2 ≤  ≤ 11.6

2.8 ≤  ≤ 6.0
=1.2 * (12/9) = 1.6
=1.2 * (24/9) = 3.2
=1.2 * (21/9) = 2.8
=1.2 * (45/9) = 6.0
=1.2 * (39/9) = 5.2
1.6 ≤  ≤ 3.2 =1.2 * (87/9) = 11.6
Robotic & Vision Lab RoVis
But higher octaves increasingly less
useful

Robotic & Vision Lab RoVis


SURF descriptor
• Once interest point has been found,
• Place window around point
• Divide into 4x4 subwindows
• In each subwindow,
• measure at 25 (5x5) places: dx and dy
• sum over all 25 places to get 4 values:

• Note: Use Haar wavelets to measure differences


(similar to gradients)

Robotic & Vision Lab RoVis


Details
10s 2s 2s

10s

20s

weight by Gaussian
 = 3.3s

20s

Robotic & Vision Lab RoVis


with s=1.2
12 3 3

12

24

weight by Gaussian
=4

sign of LoG =
sign of trace(Happrox) =
24 sgn( Dxx + Dyy )

 65 values per interest point (16 x 4 + 1)


Robotic & Vision Lab RoVis
sign of LoG

• speeds up matching

sgn( Dxx + Dyy )

Robotic & Vision Lab RoVis


Why SURF is better than SIFT

Robotic & Vision Lab RoVis


Examples: Flowers

Robotic & Vision Lab RoVis


Examples: Tillman

Robotic
SURF, 5 octaves, 4 scales per octave & Vision Lab RoVis
Examples: Tillman

Robotic
U-SURF, 1 octave, 4 scales per octave & Vision Lab RoVis
Examples: Tillman

OpenCV’s SURF Robotic & Vision Lab RoVis


Robotic & Vision Lab RoVis

Potrebbero piacerti anche