Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Details. Given an image patch X of size l × l pixels Figure 1. Learnt filters of size 9×9.
and a linear filter Wi of the same size, the filter response
si is obtained by mension is reduced by keeping only the n first princi-
X pal components which are further divided by their stan-
si = Wi (u, v)X(u, v) = wi⊤ x, (1) dard deviation to get whitened data samples z. In de-
u,v
tail, given the eigendecomposition C = EDE⊤ of the
where vector notation is introduced in the latter stage, covariance matrix C of samples x, the matrix V is de-
i.e., vectors w and x contain the pixels of Wi and X. fined by
The binarized feature bi is obtained by setting bi = 1 V = D−1/2 E⊤ , (3)
1:n
if si > 0 and bi = 0 otherwise. Given n linear filters
Wi , we may stack them to a matrix W of size n × l2 where the main diagonal of D contains the eigenvalues
and compute all responses at once, i.e., s = Wx and of C in descending order, and (·)1:n denotes the first n
we obtain the bit string b by binarizing each element si rows of the matrix in parenthesis.
of s as above. Thus, given the linear feature detectors Then, given the zero-mean whitened data samples
Wi , computation of the bit string b is straightforward. z, one may use standard independent component analy-
Also, it is clear that the bit strings for all image patches sis algorithms to estimate an orthogonal matrix U with
of size l×l, surrounding each pixel of an image, can be which one yields the independent components s of the
computed conveniently by n convolutions. training data [7]. In other words, since z = U−1 s,
In order to obtain a useful set of filters Wi we apply the independent components allow to represent the data
the ideas of [6] and estimate the filters by maximizing samples z as a linear superposition of the basis vectors
the statistical independence of si . In general, this ap- defined by the columns of U−1 . Finally, given U and
proach provides good features for image processing [6]. V, one obtains the filter matrix W = UV, which can
Furthermore, in our case, the independence of si pro- be directly used for computing BSIF features.
vides justification for the proposed independent quanti-
zation of the elements of the response vector s. Thus, Implementation. In all the experiments of this paper,
costly vector quantization, used e.g. in [14], is not nec- we used the same filters learnt from a set of 13 natural
essary here for obtaining a discrete texton vocabulary. images provided by the authors of [6]. Before random
However, in order to use standard independent com- sampling of image patches for learning, the image in-
ponent analysis (ICA) algorithms for estimating the in- tensities were normalized to have a zero mean and unit
dependent components, one has to decompose the filter variance. As described above, there are two parameters
matrix W into two parts by in our BSIF descriptor: the filter size l and the length
n of the bit string. We learnt the filters W using sev-
s = Wx = UVx = Uz, (2) eral different choices of parameter values, each set of
filters was learnt using 50000 image patches. Learning
where z = Vx, and U is a n × n square matrix which was conducted by the three-stage process detailed in the
will be estimated via ICA, and matrix V performs the previous subsection: (a) subtraction of the mean inten-
canonical preprocessing, i.e. simultaneous whitening sity of each patch, (b) dimension reduction and whiten-
and dimension reduction of training samples x [6]. ing via principal component analysis, and (c) estimation
In short, the canonical preprocessing uses principal of independent components. The filters obtained with
component analysis as follows. Given a training set of l = 9, n = 8 are illustrated in Figure 1. In addition, our
image patches randomly sampled from natural images, implementation of BSIF features is available online.1
the patches are first made zero-mean (i.e. the mean in-
tensity of each patch is subtracted) and then their di- 1 http://www.cse.oulu.fi/Downloads/BSIF
1364
Figure 2. Samples from Outex database Figure 3. Samples from CUReT database
(top) and corresponding BSIF codes. (top) and corresponding BSIF codes.
3 Experiments
In this section we will asses the proposed method in
two canonical texture recognition applications: texture
classification and face recognition. For texture classifi-
cation, we use Outex and CUReT benchmark datasets,
for which we adopt the train-test splits defined in Ou-
tex test suite 00002 and in [14], respectively. Outex
dataset contains 24 texture types and 368 images per
class, while the CUReT database consists of 61 textures
and 205 images per class. Some examples and corre-
sponding BSIF code images are shown in Figures 2 and
3. The classification is performed using nearest neigh-
bor classifier with χ2 distance metric. Figure 4. A sample gallery image and two
The baseline for the experiments is formed by Local probe images images from FRGC (top),
Binary Patterns (LBP) [9], BIF-column (BIFc) [5], and and corresponding BSIF codes. The blur
Local Phase Quantization (LPQ) [10] methods. We use in the probe images is clearly observable.
standard 8 bit coding for LBP and LPQ, which results in
a feature vector with 256 elements. For BIFc we apply sets, which represent 152 subjects. There are exactly
the parameters described in the original paper. Hence, 1 gallery image and 2-7 probe images per subject, to-
BIFc applies 4 different scales with 6 codewords per taling 152 images in the gallery and 608 images in the
scale, resulting in descriptor with 1296 elements. probe set. The images in the gallery are acquired with
The classification accuracies with different filter good quality camera under controlled conditions, while
sizes are reported in Figures 5(a) and 5(b). The re- probe images are taken with pocket digital camera in
sults indicate that the proposed descriptor is consis- uncontrolled conditions. Therefore, probe images con-
tently better than LBP or LPQ over a range of differ- tain considerable variations in lightning, facial expres-
ent filter sizes. Interestingly, already 7 bit version of sion, and blur. Some examples and the corresponding
BSIF outperforms all baselines in Outex and LPQ in BSIF code images are shown in Figure 4.
CUReT. Compared to BIFc descriptor, the new method For the recognition we apply the procedure described
performs clearly better in Outex. In the case of CUReT in [1]: the face image is first divided into 8 × 8 non-
datababase, BIFc gives 1.5 percent better accuracy than overlapping rectangular regions and a given descriptor
8 bit BSIF, but the difference vanishes when the descrip- is computed independently within each of these regions.
tor length is increased to be equivalent to BIFc. The im- Finally, the descriptors from different regions are con-
ages in CUReT are taken from several viewing angles catenated to a global description of the face. The clas-
and hence they contain some rotations. Since BIFc is sification is performed using nearest neighbor classifier
rotation invariant, it is understandable that it performs with χ2 distance metric. Also, as in [2], we apply the
well in this experiment. However, BSIF reaches the illumination normalization proposed in [13].
same performance, even if it was not particularly de- The average face recognition accuracies are shown
signed to be rotation invariant. in Figure 5(c). The effect of blurring is clearly ob-
In the face recognition experiment we apply the Face servable in the BSIF results. If the number of filters
Recognition Grand Challenge (FRGC) test 1.0.4 [11]. is increased while keeping the size fixed, more high fre-
The FRGC database is divided into probe and gallery quency information will be included into the descrip-
1365
95 98 80
90 96 75
Classification accuracy
Classification accuracy
Classification accuracy
85 94 70
80 92 LBP 65
LBP LPQ
LPQ BIFc LBP
75 BIFc 90 BSIF 6 bit 60 LPQ
BSIF 6 bit BSIF 7 bit BSIF 6 bit
BSIF 7 bit BSIF 8 bit BSIF 7 bit
70 BSIF 8 bit 88 BSIF 9 bit 55 BSIF 8 bit
BSIF 9 bit BSIF 10 bit BSIF 9 bit
BSIF 10 bit BSIF 11 bit BSIF 10 bit
65 86 50
2 4 6 8 10 2 4 6 8 10 6 8 10 12 14
Filter size Filter size Filter size
tor. Since high frequencies are particularly disturbed by This indicates the tolerance of BSIF descriptor to im-
blurring, it will affect the performance of such combi- age degradations appearing in practice.
nations. Hence, we need to enlarge the filter size with
respect to the code length in order to cope with blur- References
ring. The 6 and 7 bit version of BSIF give good re-
sults already with 7 × 7 filter, losing only 3 percents to [1] T. Ahonen et al. Face description with local binary pat-
the blur invariant LPQ. When increasing the descriptor terns: Application to face recognition. TPAMI, 2006.
length to 8 bit and above, we reach approximately the [2] T. Ahonen et al. Recognition of blurred faces using local
same performance as LPQ with 13 × 13 filters. Fur- phase quantization. In ICPR, 2008.
[3] H. Bay et al. Speeded-up robust features (SURF).
thermore, already 6 bit BSIF outperforms LBP over the
CVIU, 2008.
wide range of filter sizes. [4] M. Calonder et al. BRIEF: Binary robust independent
elementary features. In ECCV, 2010.
4 Conclusion [5] M. Crosier and L. Griffin. Using basic image features
for texture classification. IJCV, 3(88):447–460, 2010.
In this paper we presented a method for constructing [6] A. Hyvärinen et al. Natural Image Statistics. Springer,
2009.
local texture descriptors, based on independent compo-
[7] A. Hyvärinen and E. Oja. Independent component anal-
nent analysis and efficient scalar quantization scheme. ysis: algorithms and applications. Neural Networks,
The proposed algorithm results in a binary code for 2000.
each pixel, which can be subsequently used to construct [8] D. G. Lowe. Distinctive image features from scale-
a convenient histogram representation for image areas. invariant keypoints. IJCV, 2004.
The key idea in the approach is to apply learning, in- [9] T. Ojala et al. Multiresolution gray-scale and rotation
stead of manual tuning, to obtain statistically meaning- invariant texture classification with local binary pat-
ful representation of the data, which enables efficient terns. TPAMI, 7(24):971–987, 2002.
information encoding using simple element-wise quan- [10] V. Ojansivu and J. Heikkilä. Blur insensitive texture
classification using local phase quantization. In ISP,
tization. Learning provides also an easy and flexible
2008.
way to adjust the descriptor length and to adapt to ap- [11] P. Phillips et al. Overview of the face recognition grand
plications with unusual image characteristics. challenge. 2005.
In texture classification experiments, the new BSIF [12] M. Pietikäinen et al. Computer Vision Using Local Bi-
descriptor clearly outperformed the state-of-the-art nary Patterns. Springer, 2011.
baseline methods with equivalent descriptor length. In- [13] X. Tan and B. Triggs. Enhanced local texture feature
terestingly, also more compact versions of BSIF re- sets for face recognition under difficult lighting condi-
sulted in better accuracy than some of the baselines. tions. In AMFG, 2007.
[14] M. Varma and A. Zisserman. A statistical approach
Moreover, the proposed method was tested in a face
to texture classification from single images. IJCV,
recognition application, where it resulted in equal per-
62(1):61–81, 2005.
formance to the state-of-the-art descriptors. Although
some of the test images were blurred and imperfectly
aligned, BSIF resulted in similar performance as specif-
ically designed rotation and blur invariant methods.
1366