Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
by Diva Sian
by swashford
Harder case
Feature Descriptors
Advantages of local features
Locality
features are local, so robust to occlusion and clutter
Distinctiveness:
can differentiate a large database of objects
Quantity
hundreds or thousands in a single image
Efficiency
real-time performance achievable
Generality
exploit different types of features in different situations
More motivation
Feature points are used for:
Image alignment (e.g., mosaics)
3D reconstruction
Motion tracking
Object recognition
Indexing and database retrieval
Robot navigation
other
Want uniqueness
Look for image regions that are unusual
Lead to unambiguous matches in other images
deformed
Intuition
Corners
We should easily recognize the point by
looking through a small window
Shifting a window in any direction should give
a large change in intensity
I I I I
Notation Ix Iy IxI y
:
x y x y
First compute Ix, Iy, and IxIy as 3 images; then apply Gaussian to each.
OR, first apply the Gaussian and the compute the derivatives.
The math
To compute the eigenvalues:
Reminder:
Harris detector: Steps
1. Compute derivatives Ix, Iy and IxIy at each pixel and
smooth them with a Gaussian. (Or smooth first and then
derivatives.)
2. Compute the Harris matrix H in a window around each
pixel
3. Compute corner response function R
4.Threshold R
5.Find local maxima of response function (nonmaximum
suppression)
Instead of
We can use
Invariance
Suppose you rotate the image by some angle
Will you still pick up the same features?
Scale?
Properties of the Harris corner detector
Corner !
All points will be
classified as edges
Scale invariant detection
Suppose youre looking for corners
Slide
Slidefrom
fromTinne
TinneTuytelaars
Tuytelaars
Feature descriptors
We know how to detect good points
Next question: How to match them?
?
Feature descriptors
We know how to detect good points
Next question: How to match them?
?
Lots of possibilities (this is a popular research area)
Simple option: match square windows around the point
State of the art approach: SIFT
David Lowe, UBC http://www.cs.ubc.ca/~lowe/keypoints/
Invariance
Suppose we are comparing two images I1 and I2
I2 may be a transformed version of I1
What kinds of transformations are we likely to encounter in
practice?
Invariance
Suppose we are comparing two images I1 and I2
I2 may be a transformed version of I1
What kinds of transformations are we likely to encounter in
practice?
8 pixels
f ( I i1im ( x, )) = f ( I i1im ( x, ))
It is invariant to
scale change, i.e.,
and has several
other nice
properties. Lindeberg, 1994
G1 - G2 = DoG
- =
K. Grauman, B. Leibe
DoG example
Take Gaussians at
multiple spreads
and uses DoGs.
=1
= 66
Scale invariant interest points
Interest points are local maxima in both
position and scale. Look for extrema
5 in difference of
Gaussians.
4 scale
2
Apply Gaussians List of
with different s. (x, y, )
1
Scale
Lowe, 2004.
Lowes Pyramid Scheme
s+2 filters
s+1=2(s+1)/s0
.
.
i=2i/s0
.
. s+3 s+2
2=22/s0 images differ-
1=21/s0 including ence
0 original images
The parameter s determines the number of images per octave. 57
Key point localization
s+2 difference images.
top and bottom ignored.
s planes searched.
Blur
Subtract
% correctly matched
Stability Expense
Sampling in scale for efficiency
How many scales should be used per octave? S=?
More scales evaluated, more keypoints found
S < 3, stable keypoints increased too
S > 3, stable keypoints decreased
S = 3, maximum stable keypoints found
59
Results: Difference-of-Gaussian
K. Grauman, B. Leibe
How can we find correspondences?
Similarity
Orientation Normalization
Compute orientation histogram
Select dominant orientation [Lowe, SIFT, 1999]
Normalize: rotate to fixed orientation
0 2
T. Tuytelaars, B. Leibe
Whats next?
128-dimensional vector
Important Point
2. a region descriptor
65
How can we find corresponding points?
How can we find correspondences?
How do we describe an image patch?
How do we describe an image patch?
70
SIFT descriptor
Full version
Divide the 16x16 window (8x8 case shown below) into a 4x4 grid of cells (2x2
case shown below)
Compute an orientation histogram for each cell
16 cells * 8 orientations = 128 dimensional descriptor
71
Adapted from slide by David Lowe
SIFT descriptor
Full version
Divide the 16x16 window into a 4x4 grid of cells
Compute an orientation histogram for each cell
16 cells * 8 orientations = 128 dimensional descriptor
8 8 ... 8
72
Numeric Example
73
by Yao Lu
L(x-1,y-1) L(x,y-1) L(x+1,y-1) 0.98
2 2
magnitude(x,y)= + 1, 1, + , + 1 , 1
L x,y+1 L x,y1
(x,y)=a(
L(x+1,y)L(x1,y)
74
by Yao Lu
The orientations all
ended up in two bins:
Orientations in each of 11 in one bin, 5 in the
the 16 pixels of the cell other. (rough count)
5 11 0 0 0 0 0 0
75
SIFT descriptor
Full version
Start with a 16x16 window (256 pixels)
Divide the 16x16 window into a 4x4 grid of cells (16 cells)
Compute an orientation histogram for each cell
16 cells * 8 orientations = 128 dimensional descriptor
Threshold normalize the descriptor:
such that:
0.2
76
Adapted from slide by David Lowe
Properties of SIFT
Extraordinarily robust matching technique
Can handle changes in viewpoint
Up to about 30 degree out of plane rotation
Can handle significant changes in illumination
Sometimes even day vs. night (below)
Fast and efficientcan run in real time
Various code available
http://www.cs.ubc.ca/~lowe/keypoints/
77
Example
79
Lowe, IJCV04
Example: Google Goggle
80
panorama?
We need to match (align) images
81
Matching with Features
Detect feature points in both images
82
Matching with Features
Detect feature points in both images
Find corresponding pairs
83
Matching with Features
Detect feature points in both images
Find corresponding pairs
Use these matching pairs to align images - the
required mapping is called a homography.
84
Automatic mosaicing
85
Recognition of specific objects, scenes
87
When does the SIFT descriptor fail?
88
Other methods: Daisy
Circular gradient binning
SIFT
Daisy
89 09
Picking the best DAISY, S. Winder, G. Hua, M. Brown, CVPR
Other methods: SURF
For computational efficiency only compute
gradient histogram with 4 bins:
a
b
Daisy
?
Feature distance
How to define the difference between two features f1, f2?
Simple approach is SSD(f1, f2)
sum of square differences between entries of the two descriptors
(f1i f2i)2
i
But it can give good scores to very ambiguous (bad) matches
f1 f2
93
I1 I2
Feature distance in practice
How to define the difference between two features f1, f2?
Better approach: ratio distance = SSD(f1, f2) / SSD(f1, f2)
f2 is best SSD match to f1 in I2
f2 is 2nd best SSD match to f1 in I2
gives large values (~1) for ambiguous matches WHY?
f1 f2' f2
I1 I2 94
Eliminating more bad matches
50
true match
75
200
false match
feature distance
95
True/false positives
50
true match
75
200
false match
feature distance
Recall(True positive rate)= True negative rate=
+ +
Precision= +
+ Accuracy=
+++
Evaluating the results
How can we measure the performance of a feature matcher?
0.7
False positive rate=
+
Evaluating the results
How can we measure the performance of a feature matcher?
ROC curve (Receiver Operator Characteristic)
1
0.7
SIFT usage:
Recognize
charging
station
Communicate
with visual
cards
Teach object
recognition
Other kinds of descriptors
104
Local Descriptors: Shape Context
Count = ?
...
Count = ?
105
Belongie & Malik, ICCV 2001
K. Grauman, B. Leibe
Texture
The texture features of a patch can be considered a
descriptor.
E.g. the LBP histogram is a texture descriptor for a
patch.
107
Bag-of-words models
Orderless document representation: frequencies of words
from a dictionary Salton & McGill (1983)
112
Bags of features for image classification
1. Extract features
2. Learn visual vocabulary
113
Bags of features for image classification
1. Extract features
2. Learn visual vocabulary
3. Quantize features using visual vocabulary
114
Bags of features for image classification
1. Extract features
2. Learn visual vocabulary
3. Quantize features using visual vocabulary
4. Represent images by frequencies of
visual words
115
A possible texture representation
histogram
Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori,
116
Belongie & Malik, 2001; Schmid 2001; Varma & Zisserman, 2002,
1. Feature extraction
117
1. Feature extraction
3 2
Compute
SIFT Normalize
descriptor patch
[Lowe99]
1 Detect patches
[Mikojaczyk and Schmid 02]
[Mata, Chum, Urban & Pajdla, 02]
[Sivic & Zisserman, 03]
118
Slide credit: Josef Sivic
1. Feature extraction
119
2. Discovering the visual vocabulary
120
2. Discovering the visual vocabulary
Clustering
121
Slide credit: Josef Sivic
2. Discovering the visual vocabulary
Visual vocabulary
Clustering
122
Slide credit: Josef Sivic
Viewpoint invariant description (Sivic)
Two types of viewpoint covariant regions computed
for each frame
Shape Adapted (SA) Mikolajczyk & Schmid
Maximally Stable (MSER) Matas et al.
Detect different kinds of image areas
Provide complimentary representations of frame
Computed at twice originally detected region size to
be more discriminating
123
Examples of Harris-Affine Operator
124
Examples of Maximally Stable Regions
125
Maximally Stable Extremal Regions
J.Matas et.al. Distinguished Regions for Wide-baseline Stereo. BMVC 2002.
128
Noise Removal
Tracking region over 70 frames (must track
over at least 3)
129
Visual Vocabulary for Sivics Work
130
Visual Vocabulary
Shape-Adapted
Maximally Stable
131
Sivics Experiments on Video Shot Retrieval
Goal: match scene
locations within closed
world of shots
Data:164 frames from
48 shots taken at 19
different 3D locations;
4-9 frames from each
location
132
Experiments - Results
134
Clustering and vector quantization
Clustering is a common method for learning a visual
vocabulary or codebook
Each cluster center produced by k-means becomes a
codevector
Codebook can be learned on separate training set
The codebook is used for quantizing features
A vector quantizer takes a feature vector and maps it to the
index of the nearest code vector in a codebook
Codebook = visual vocabulary
Code vector = visual word
1 code vector 1
feature 2 code vector 2
vector 3 code vector 3
135
Another example visual vocabulary
136
Fei-Fei et al. 2005
Example codebook
Appearance codebook
137
Source: B. Leibe
Another codebook
Appearance codebook
138
Source: B. Leibe
Visual vocabularies: Issues
139
3. Image representation: histogram of codewords
frequency
..
codewords
140
Image classification
Given the bag-of-features representations of images from
different classes, learn a classifier using machine learning
141
But what about layout?
level 0
143
Lazebnik, Schmid & Ponce (CVPR 2006)
Spatial pyramid representation
Extension of a bag of features
Locally orderless representation at several levels of resolution
level 0 level 1
144
Lazebnik, Schmid & Ponce (CVPR 2006)
Extension of a bag of features
Locally orderless representation at several levels of resolution