Sei sulla pagina 1di 11

Feature Matching

Histogram of oriented gradients


A feature descriptor is a representation of an image or an image patch that simplifies the image
by extracting useful information and throwing away extraneous information. In the HOG feature
descriptor, the distribution ( histograms ) of directions of gradients are used as features.
Gradients of an image are useful because the magnitude of gradients is large around edges and
corners ( regions of abrupt intensity changes ).Local shape information is often well described by
the distribution of intensity gradients or edge directions even without precise information about
the location of the edges themselves.

Preprocessing

To calculate a HOG descriptor, we need to first calculate the horizontal and vertical gradients;
after all, we want to calculate the histogram of gradients. This is easily achieved by filtering the
image with the following kernels.

Now we find the magnitude of the gradient using the


formula:

Calculate Histogram of Gradients in 8×8 cells


The image is divided into 8×8 cells and a histogram of gradients is calculated for each 8×8
cells.Calculating a histogram over a patch makes this representation more robust to noise.8×8
cells in a photo of a pedestrian scaled to 64×128 are big enough to capture interesting features
( e.g. the face, the top of the head etc. ).The histogram is essentially a vector ( or an array ) of 9
bins ( numbers ) corresponding to angles 0, 20, 40, 60 … 160.

To calculate the final feature vector for the entire image


patch, the 36×1 vectors are concatenated into one giant
vector.

● HOG in OpenCV:
The Local Binary Pattern Operator
The LBP operator takes the 3x3
surrounding of a pixel and generates a
binary 1 if the neighbour Of the centre
pixel has larger value than the centre
pixel. The operator generates a binary
0 if the neighbour is less than the
centre. The eight neighbour of the
centre can then be represented with an
8-bit number such as an unsigned 8-bit
integer making it very compact.

The LBP operator was first introduced as a complimentary measure for contrast and therefore the
contrast C is calculated as the average of the pixels above the threshold minus the average of
the pixels under the threshold.
Laws’ Measures of Texture Energy
Laws proposed a method for classifying each pixel in an image based upon measures of local
“texture energy”.The texture energy features represent the amounts of variation within a sliding
window applied to several filtered versions of the given image. The filters are specified as
separable 1D arrays for convolution with the image being processed.

L3 = [ 1 2 1 ] E3 = [ −1 0 1 ] S3 = [ −1 2 −1 ].

The operators L3,E3, and S3 perform center weighted averaging,symmetric first differencing
(edge detection), and second differencing (spot detection), respectively.Nine 3 × 3 masks may be
generated by multiplying the transposes of the three operators (represented as vectors) with their
direct versions.The result of (L3T)* E3 gives one of the 3 × 3 Sobel masks.Operators of length
five pixels may be generated by convolving the L3,E3, and S3 operators in various combinations.

L5 = L3 ∗ L3 = [ 1 4 6 4 1 ] (local average)
E5 = L3 ∗ E3 = [ −1 −2 0 2 1 ] (edges)
S5 = −E3 ∗ E3 = [ −1 0 2 0 −1 ] (spots)
R5 = S3 ∗ S3 = [ 1 −4 6 −4 1 ] (ripples)
W5 = −E3 ∗ S3 = [ −1 2 0 −2 1 ] (waves)

In the analysis of texture in 2D images, the 1D convolution operators are used in pairs to achieve
various 2D convolution operators:
each of which may be represented as a 5 × 5 array or matrix. Following the application of the
selected filters, texture energy measures are derived from each filtered image by computing the
sum of the absolute values in a sliding window.

All of the filters listed, except L5, have zero mean, and hence the texture energy measures
derived from the filtered images represent measures of local deviation or variation.The result of
the L5 filter may be used for normalization with respect to luminance and contrast.Feature
vectors composed of the values of various Laws’ operators for each pixel may be used for
classifying the image into texture categories on a pixel by pixel basis. The results may be used
for texture segmentation and recognition.

K-NN matching
The KNN algorithm is probably the simplest machine learning algorithm, that is, given a trained
data set, for the new input instance, find the K instances closest to the instance in the training
data set. The majority of the K instances belong to a certain feature for each class, it is
determined that the input instance belongs to the same class.
If a sample in the feature space has the most similar among the k most similar samples belongs
to a certain category, then the sample also belongs to this category. The K points closest to them
vote to decide which category of data to be classified.

Bag-of-Visual-Words
Bag-of-features represent a data item (document, texture, image) as a histogram over features.
An object has several local features which are known as bag-of-features.

There are four main steps of BoW:

1. Extract features
2. Learn “visual vocabulary”

3. Quantize features using visual vocabulary

4. Represent images by frequencies of “visual words”

Visual Vocabulary
● TF-IDF weighting

• Instead of computing a regular histogram distance, we’ll weight each word by its inverse
document frequency
• inverse document frequency (IDF) of word j =
VLAD : Vector of Locally Aggregated Descriptors
● Learning: k-means
output: k centroids : c1,…,ci,…c k

● VLAD computation:

● L2-normalized

● Typical parameter: k=64 (D=8192)

Indexing algorithm: searching with quantization


● Search/Indexing = distance approximation
problem
● The distance between a query vector x and a
database vector y is estimated by

● The choice of the quantizer is critical


needs many centroids
regular k-means and approximate k-means can not be
used we typically want k=264 for 64-bit codes
Product quantization for nearest neighbor search
● Vector split into m subvectors
● Subvectors are quantized separately by quantizers

Compute the square distance approximation in the compressed domain To compute distance
between query and many codes compute for each subvector and all possible centroids stored in
look-up tables for each database code: sum the elementary square distances Each 8x8=64-bits
code requires only m=8 additions per distance! IVFADC: combination with an inverted file to
avoid exhaustive search

VLAD vectors undergoes two approximations


► mean square error from PCA projection
► mean square error from quantization

References:
http://gyan.iitg.ernet.in/bitstream/handle/123456789/1339/TH-2012_146102009.pdf?
sequence=3&isAllowed=y
http://dovgalecs.com/blog/vlad-descriptor/
https://www.epfl.ch/labs/ivrl/wp-content/uploads/2018/08/cvml2012_CORDELIA_SCHMID_part5.pdf
https://www.di.ens.fr/willow/events/cvml2013/materials/slides/monday/Mon_2_search_large_2013.pdf
https://www.robots.ox.ac.uk/~vgg/publications/2013/arandjelovic13/arandjelovic13.pdf
https://www2.cs.duke.edu/courses/fall15/compsci527/notes/hog.pdf
https://web.stanford.edu/class/cs231a/sessions/session6_problem_set_3.pdf
https://www.cs.umd.edu/class/fall2019/cmsc426-0201/files/24_HOG.pdf
http://vision.stanford.edu/teaching/cs131_fall1718/files/14_BoW_bayes.pdf
http://www.cs.cmu.edu/~16385/s15/lectures/Lecture12.pdf
http://www.cvc.uab.es/~marcal/pdfs/IJDAR15.pdf
https://arxiv.org/pdf/1505.05190.pdf
http://www.robots.ox.ac.uk/~az/icvss08_az_bow.pdf
https://lear.inrialpes.fr/~verbeek/mlcr.slides.11.12/visual_search_gre1.pdf

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.172.2896&rep=rep1&type=pdf
http://www.ee.iisc.ac.in/people/faculty/soma.biswas/STIP_pdf/Texture.pdf
http://www.enggjournals.com/ijet/docs/IJET17-09-02-305.pdf
https://courses.cs.washington.edu/courses/cse455/09wi/Lects/lect12.pdf

Potrebbero piacerti anche