Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Preprocessing
To calculate a HOG descriptor, we need to first calculate the horizontal and vertical gradients;
after all, we want to calculate the histogram of gradients. This is easily achieved by filtering the
image with the following kernels.
● HOG in OpenCV:
The Local Binary Pattern Operator
The LBP operator takes the 3x3
surrounding of a pixel and generates a
binary 1 if the neighbour Of the centre
pixel has larger value than the centre
pixel. The operator generates a binary
0 if the neighbour is less than the
centre. The eight neighbour of the
centre can then be represented with an
8-bit number such as an unsigned 8-bit
integer making it very compact.
The LBP operator was first introduced as a complimentary measure for contrast and therefore the
contrast C is calculated as the average of the pixels above the threshold minus the average of
the pixels under the threshold.
Laws’ Measures of Texture Energy
Laws proposed a method for classifying each pixel in an image based upon measures of local
“texture energy”.The texture energy features represent the amounts of variation within a sliding
window applied to several filtered versions of the given image. The filters are specified as
separable 1D arrays for convolution with the image being processed.
L3 = [ 1 2 1 ] E3 = [ −1 0 1 ] S3 = [ −1 2 −1 ].
The operators L3,E3, and S3 perform center weighted averaging,symmetric first differencing
(edge detection), and second differencing (spot detection), respectively.Nine 3 × 3 masks may be
generated by multiplying the transposes of the three operators (represented as vectors) with their
direct versions.The result of (L3T)* E3 gives one of the 3 × 3 Sobel masks.Operators of length
five pixels may be generated by convolving the L3,E3, and S3 operators in various combinations.
L5 = L3 ∗ L3 = [ 1 4 6 4 1 ] (local average)
E5 = L3 ∗ E3 = [ −1 −2 0 2 1 ] (edges)
S5 = −E3 ∗ E3 = [ −1 0 2 0 −1 ] (spots)
R5 = S3 ∗ S3 = [ 1 −4 6 −4 1 ] (ripples)
W5 = −E3 ∗ S3 = [ −1 2 0 −2 1 ] (waves)
In the analysis of texture in 2D images, the 1D convolution operators are used in pairs to achieve
various 2D convolution operators:
each of which may be represented as a 5 × 5 array or matrix. Following the application of the
selected filters, texture energy measures are derived from each filtered image by computing the
sum of the absolute values in a sliding window.
All of the filters listed, except L5, have zero mean, and hence the texture energy measures
derived from the filtered images represent measures of local deviation or variation.The result of
the L5 filter may be used for normalization with respect to luminance and contrast.Feature
vectors composed of the values of various Laws’ operators for each pixel may be used for
classifying the image into texture categories on a pixel by pixel basis. The results may be used
for texture segmentation and recognition.
K-NN matching
The KNN algorithm is probably the simplest machine learning algorithm, that is, given a trained
data set, for the new input instance, find the K instances closest to the instance in the training
data set. The majority of the K instances belong to a certain feature for each class, it is
determined that the input instance belongs to the same class.
If a sample in the feature space has the most similar among the k most similar samples belongs
to a certain category, then the sample also belongs to this category. The K points closest to them
vote to decide which category of data to be classified.
Bag-of-Visual-Words
Bag-of-features represent a data item (document, texture, image) as a histogram over features.
An object has several local features which are known as bag-of-features.
1. Extract features
2. Learn “visual vocabulary”
Visual Vocabulary
● TF-IDF weighting
• Instead of computing a regular histogram distance, we’ll weight each word by its inverse
document frequency
• inverse document frequency (IDF) of word j =
VLAD : Vector of Locally Aggregated Descriptors
● Learning: k-means
output: k centroids : c1,…,ci,…c k
● VLAD computation:
● L2-normalized
Compute the square distance approximation in the compressed domain To compute distance
between query and many codes compute for each subvector and all possible centroids stored in
look-up tables for each database code: sum the elementary square distances Each 8x8=64-bits
code requires only m=8 additions per distance! IVFADC: combination with an inverted file to
avoid exhaustive search
References:
http://gyan.iitg.ernet.in/bitstream/handle/123456789/1339/TH-2012_146102009.pdf?
sequence=3&isAllowed=y
http://dovgalecs.com/blog/vlad-descriptor/
https://www.epfl.ch/labs/ivrl/wp-content/uploads/2018/08/cvml2012_CORDELIA_SCHMID_part5.pdf
https://www.di.ens.fr/willow/events/cvml2013/materials/slides/monday/Mon_2_search_large_2013.pdf
https://www.robots.ox.ac.uk/~vgg/publications/2013/arandjelovic13/arandjelovic13.pdf
https://www2.cs.duke.edu/courses/fall15/compsci527/notes/hog.pdf
https://web.stanford.edu/class/cs231a/sessions/session6_problem_set_3.pdf
https://www.cs.umd.edu/class/fall2019/cmsc426-0201/files/24_HOG.pdf
http://vision.stanford.edu/teaching/cs131_fall1718/files/14_BoW_bayes.pdf
http://www.cs.cmu.edu/~16385/s15/lectures/Lecture12.pdf
http://www.cvc.uab.es/~marcal/pdfs/IJDAR15.pdf
https://arxiv.org/pdf/1505.05190.pdf
http://www.robots.ox.ac.uk/~az/icvss08_az_bow.pdf
https://lear.inrialpes.fr/~verbeek/mlcr.slides.11.12/visual_search_gre1.pdf
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.172.2896&rep=rep1&type=pdf
http://www.ee.iisc.ac.in/people/faculty/soma.biswas/STIP_pdf/Texture.pdf
http://www.enggjournals.com/ijet/docs/IJET17-09-02-305.pdf
https://courses.cs.washington.edu/courses/cse455/09wi/Lects/lect12.pdf