Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Keywords: Linear discriminant analysis; High dimensional data; Simultaneous diagonalization; Face recognition
* Corresponding author. Tel.: #1-412-268-5479; fax: #1- At the core of the direct LDA algorithm lies the idea
412-268-6298. of simultaneous diagonalization, the same as in the tradi-
E-mail address: hyu@cs.cmu.edu (H. Yu). tional LDA algorithm. As the name suggests, it tries
0031-3203/01/$20.00 2001 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved.
PII: S 0 0 3 1 - 3 2 0 3 ( 0 0 ) 0 0 1 6 2 - X
2068 H. Yu, J. Yang / Pattern Recognition 34 (2001) 2067}2070
to "nd a matrix that simultaneously diagonalizes both Since the objective is to maximize the ratio of total-
S and S : scatter against within-class scatter, we can sort the diag-
U @
onal elements of D and discard some eigenvalues in the
AS A2"I, AS A2", U
U @ high end, together with the corresponding eigenvectors.
where is a diagonal matrix with diagonal elements It is important to keep the dimensions with the smallest
sorted in decreasing order. To reduce dimensionality to eigenvalues, especially zeros. This is exactly the reason
m, we simply pick the top m rows of A, which corresponds why we started by diagonalizing S , rather than S . See
@ U
to the largest m diagonal elements in . Details of the Section 2.2 for more discussion.
algorithm can be found in Ref. [4]. (3) Let the LDA matrix
The key idea of our new algorithm is to discard the null
space of S * which contains no useful information A";2Z2.
@
* rather than discarding the null space of S , which
U A diagonalizes both the numerator and the denominator
contains the most discriminative information. This can in Fisher's criterion
be achieved by diagonalizing S "rst and then diagonaliz-
@
ing S . The traditional procedure takes the reverse order. AS A2"D , AS A2"I.
U U U @
While both approaches produce the same result when
S is not singular, the reversal in order makes a drastic (4) For classi"cation purpose, notice that A already
U diagonalizes S ; therefore the "nal transformation that
di!erence for high-dimensional data, where S is likely to
U U
be singular. spheres the data should be
The new algorithm is outlined below. Fig. 1 provides xHQD\ Ax.
a conceptual overview of this algorithm. Computational U
issues will be discussed shortly after.
2.1. Computational considerations
(1) Diagonalize S : "nd matrix V such that
Although the scheme above gives an exact solution for
< 2S <",
@ Fisher's criterion, we have not addressed the computa-
where < 2<"I. is a diagonal matrix sorted in decreas- tional di$culty that both scatter matrices are too big to
ing order. be held in memory, let alone their eigenanalysis.
This can be done using the traditional eigenanalysis, Fortunately, the method presented by Turk and Pent-
i.e. each column of < is an eigenvector of S , and con- land [5] for the eigenface problem is still applicable. The
@
tains all the eigenvalues. As S might be singular, some of key observation is that scatter matrices can be represent-
@
the eigenvalues will be 0 (or close to 0). It is necessary to ed in a way that both saves memory, and facilitates
discard those eigenvalues and eigenvectors, as projection eigenanalysis. For example,
directions with a total scatter of 0 do not carry any
discriminative power at all. (
S " n ( !)( !)2" 2 (n;n),
Let > be the "rst m columns of < (an n;m matrix, @ G G G @ @
G
n being the feature space dimensionality), now
where
>2S >"D '0,
@ @ "[(n ( !), (n ( !),2] (n;J)
where D is the m;m principal sub-matrix of . @
@
(2) Let Z">D\, with J being the number of classes and n the number of
@
training images for class i. Thus, instead of storing an
(>D\)2 S (>D\)"I NZ2 S Z"I.
@ @ @ @ n;n matrix, we need only to store which is n;J. The
@
Thus, Z unitizes S , and reduces dimensionality from n to eigenanalysis is simpli"ed by virtue of the following
@
m. lemma:
Diagonalize Z2S Z by eigenanalysis:
U
Lemma 1. For any n;m matrix L, mapping xPLx is
;2Z2S Z;"D ,
U U a one-to-one mapping that maps eigenvectors of L2L
where ;2;"I. D may contain zeros in its diagonal. (m;m) onto those of LL2 (n;n).
U
H. Yu, J. Yang / Pattern Recognition 34 (2001) 2067}2070 2069
As 2 is a J;J matrix, eigenanalysis is a!ordable. reduces dimensionality, just as Belhumeur et al. proposed
@ @
In Step 2 of our algorithm, to compute eigenvalues for in their two-stage PCA#LDA method [3]. If their
Z2S Z, simply notice LDA step handled S 's null space properly, the two
U U
approaches would give the same performance. In a sense
S " (x ! G )(x ! G )2" 2 , our method can be called `uni"ed PCA#LDAa, since
U G I G I U U there is no separate PCA step. It not only leads to a clean
G
presentation, but also results in an e$cient implementa-
where
tion.
"[x ! , x ! ,2] (n;n ),
U I I R
with n being the total number of images in the training
R 3. Face recognition experiments
set. Thus
Z2S Z"Z2 2 Z"(2 Z)22 Z. We tested the direct LDA algorithm on face images
U U U U U from Olivetti-Oracle Research Lab (ORL, http://
We can again use Lemma 1 to compute eigenvalues. www.cam-orl.co.uk). The ORL data set consists of 400
frontal faces: 10 tightly, cropped images of 40 individuals
2.2. Discussions with variations in pose, illumination, facial expression
(open/closed eyes, smiling /not smiling) and facial details
2.2.1. Null space of S w (glasses/no glasses). The size of each image is 92;112
The traditional simultaneous diagonalization begins pixels, with 256 grey levels per pixel.
by diagonalizing S . If S is not degenerate, it gives the Three sets of experiments are conducted. In all cases
U U
same result as our approach. If S is singular, however, we randomly choose "ve images per person for training,
U
the traditional approach runs into a dilemma: to proceed, the other "ve for testing. To reduce variation, each ex-
it has to discard those eigenvalues equal to 0; but those periment is repeated at least 10 times.
discarded eigenvectors are the most important dimen- Without dimensionality reduction in Step 2, average
sions! recognition accuracy is 90.8%. With dimensionality
As Chen et al. pointed out [1], the null space of reduction, where everything outside of S 's null
U
S carries most of the discriminative information. More space is discarded, average recognition accuracy becomes
U
precisely, for a projection direction a, if S a"0, and 86.6%. This veri"es that while S 's null space is
U U
S aO0, aS a2/aS a2 is maximized. The intuitive important, discriminative information does exist outside
@ @ U
explanation is that, when projected onto direction a, of it.
within-class scatter is 0 but between-class scatter is not.
Obviously, perfect classi"cation can be achieved in this
direction. 4. Conclusions
Di!erent from the algorithm proposed in Ref. [1],
which operates solely in the null space, our algorithm can In this paper, we proposed a direct LDA algorithm for
take advantage of all the information, both within and high-dimensional data classi"cation, with application to
outside of S 's null space. Our algorithm can still be used face recognition in particular. Since the number of sam-
U
in cases where S is not singular, which is common in ples is typically smaller than the dimensionality of the
U
tasks like speech recognition. samples, both S and S are singular. By modifying the
@ U
simultaneous diagonalization procedure, we are able to
2.2.2. Equivalence to PCA#LDA discard the null space of S * which carries no dis-
@
As Fukunaga pointed out [4], there are other variants criminative information * and to keep the null space of
of Fisher's criterion S , which is very important for classi"cation. In addition,
U
computational techniques are introduced to handle large
A2S A A2S A
arg max R or arg max @ , scatter matrices e$ciently. The result is a uni"ed LDA
A2SU A A2SR A algorithm that gives an exact solution to Fisher's cri-
terion whether or not S is singular.
where S "S #S is the total scatter matrix.
R @ U U
Interestingly, if we use the "rst variant (with S in the
R
numerator), Step 1 of our algorithm becomes exactly
PCA. Discarding S 's eigenvectors with 0 eigenvalues References
R
[1] L. Chen, H. Liao, M. Ko, J. Lin, G. Yu, A new LDA-based
face recognition system which can solve the small
sample size problem, Pattern Recognition 33 (10) (2000)
Null space of S "x S x"0, x3R. 1713}1726.
U U
2070 H. Yu, J. Yang / Pattern Recognition 34 (2001) 2067}2070
[2] D. Swets, J. Weng, Using discriminant eigenfeatures for [4] K. Fukunaga, Introduction to Statistical Pattern Re-
image retrieval, Pattern Anal. Mach. Intell. 18 (8) (1996) cogniton (2nd Edition), Academic Press, New York,
831}836. 1990.
[3] P.N. Belhumeur, J.P. Hespanha, D.J. Kriegman, Eigenfaces [5] M. Turk, A. Pentland, Eigenfaces for recognition, J. Cogni-
vs. "sherface: recognition using class speci"c linear projec- tive Neurosci. 3 (1) (1991) 72}86.
tion, Pattern Anal. Mach. Intell. 19 (7) (1997) 711}720.