Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Chellappa
Received January 29, 1996; revised manuscript received October 25, 1996; accepted February 14, 1997
The discrimination power of various human facial features is studied and a new scheme for automatic face
recognition (AFR) is proposed. The first part of the paper focuses on the linear discriminant analysis (LDA)
of different aspects of human faces in the spatial as well as in the wavelet domain. This analysis allows ob-
jective evaluation of the significance of visual information in different parts (features) of the face for identifying
the human subject. The LDA of faces also provides us with a small set of features that carry the most rel-
evant information for classification purposes. The features are obtained through eigenvector analysis of scat-
ter matrices with the objective of maximizing between-class variations and minimizing within-class variations.
The result is an efficient projection-based feature-extraction and classification scheme for AFR. Each projec-
tion creates a decision axis with a certain level of discrimination power or reliability. Soft decisions made
based on each of the projections are combined, and probabilistic or evidential approaches to multisource data
analysis are used to provide more reliable recognition results. For a medium-sized database of human faces,
excellent classification accuracy is achieved with the use of very-low-dimensional feature vectors. Moreover,
the method used is general and is applicable to many other image-recognition tasks. © 1997 Optical Society
of America [S0740-3232(97)01008-9]
Key words: Face recognition, wavelet packets, discriminant eigenpictures, evidential reasoning, multi-
source soft decision integration.
obtained through processing the face image as a whole. by dramatic image variation in different poses of the sub-
Low-dimensional representations are highly desirable for ject, one has to devise methods of reducing or compensat-
large databases, fast adaptation, and good generalization. ing for such variability. For example,
On the basis of these needs, studies have been performed 1. For each subject, store several templates, one for
on the minimum acceptable image size and the smallest each major distinct facial expression and head orienta-
number of gray levels required for good recognition tion. Such systems are typically referred to as view-
results.6 Reduction in dimensionality can also be based systems.
achieved by using various data-compression schemes. 2. Use deformable templates along with a three-
For example, representations based on principal- dimensional model of a human face to synthesize virtual
component analysis16–20 (PCA) and singular-value decom- poses and apply the template matching algorithm to the
position have been studied and used extensively for vari- synthesized representations.26
ous applications. It has also been shown that the nonlin- 3. Incorporate such variations into the process of fea-
ear mapping capability of multilayer neural networks can ture extraction.
be utilized and that the internal and hidden representa-
tions of face patterns, which typically are of much lower In this paper we take the third approach and keep the
dimensionality than the original image, can be used for first method as an optional stage that can be employed de-
race and gender classification.4,5 Some of the most suc- pending on the complexity of the specific task. Our ap-
cessful AFR schemes are based on the Karhunen–Loève proach is a holistic linear discriminant analysis (LDA)-
transform17,19 (KLT), which yield the so-called eigenfaces. based feature extraction for human faces followed by an
In these methods the set of all face images is considered a evidential soft-decision integration for multisource data
vector space, and the eigenfaces are simply the dominant analysis. This method is a projection-based scheme of
principal components of this face space; they are com- low complexity that avoids any iterative search or compu-
puted as eigenvectors of the covariance matrix of data. tation. In this method both off-line feature extraction
and on-line feature computation can be done at high
In geometrical feature-based systems, one attempts to
speeds, and recognition can be done in almost real time.
locate major face components or feature points in the
Our experimental results show that high levels of recog-
image.21–24 The relative sizes of and distances between
nition performance can be achieved with low complexity
the major face components are then computed. The set
and a small number of features.
of all normalized size and distance measurements consti-
The organization of this paper is as follows. In Section
tute the final feature vectors for classification. One can
2 we provide an objective study of multiscale features of
also use the information contained in the feature points to
face images in terms of their discrimination power. In
form a geometrical graph representation of the face that
Section 3 we propose a holistic method of projection-based
directly shows sizes and relative locations of major face
discriminant facial feature extraction through LDA of
attributes.22 Most geometrical feature-based systems in-
face images. We also make a comparative study of the
volve several steps of window-based local processing, fol-
features obtained with the proposed scheme and the fea-
lowed by some iterative search algorithms, to locate the
tures used in compression-based methods such as PCA.
feature points. These methods are more adaptable to
In Section 4 we address the task of classification and
large variations in scale, size, and location of the face in
matching through multisource data analysis and combin-
an image but are more susceptible to errors when face de-
ing soft decisions from multiple imprecise information
tails are occluded by objects, e.g., glasses, or by facial
sources. Finally, we propose a task-dependent measure
hair, facial expressions, or variations in head orientation.
of similarity in the feature space that is based on the re-
Compared with template and PCA-based systems, these
liability of the basic decisions, to be used at the identifi-
methods are computationally more expensive. Compara-
cation stage.
tive studies of template versus local feature-based sys-
tems can be found in Refs. 4, 7, and 25. There are also
various hybrid schemes that apply the KLT or the tem-
plate matching idea to face components and use 2. LINEAR DISCRIMINANT ANALYSIS OF
correlation-based searching to locate and identify facial FACIAL IMAGES
feature points.4,19 The advantage of performing As highly structured two-dimensional patterns, human
component-by-component matching is improved robust- face images can be analyzed in the spatial and the fre-
ness against head orientation changes, but its disadvan- quency domains. These patterns are composed of compo-
tage is the complexity of searching for and locating face nents that are easily recognized at high levels but are
components. loosely defined at low levels of our visual system.2,27
The human audiovisual system, as a powerful recogni- Each of the facial components (features) has a different
tion model, takes great advantage of context and auxil- discrimination power for identifying a person or the per-
iary information. Inspired by this observation, one can son’s gender, race, and age. There have been many stud-
devise schemes that can consistently incorporate context ies of the significance of such features that used subjec-
and collateral information, when and if they become tive psychovisual experiments.1,2
available, to enhance its final decisions. Incorporating Using objective measures, in this section we propose a
information such as race, age, and gender, obtained computational scheme for evaluating the significance of
through independent analysis, improves recognition different facial attributes in terms of their discrimination
results.19 Also, since face recognition involves a classifi- potential. The results of this analysis can be supported
cation problem with large within-class variations, caused by subjective psychovisual findings. To analyze any rep-
1726 J. Opt. Soc. Am. A / Vol. 14, No. 8 / August 1997 K. Etemad and R. Chellappa
resentation V, where V can be the original image, its spa- ity of the ith class. The discriminatory power of a repre-
tial segments, or transformed images, we provide the fol- sentation can be quantified by using various measures.
lowing framework. In this paper we use the separation matrix, which shows
First, we need a training set composed of a relatively the combination of within- and between-class scatters of
large group of subjects with diverse facial characteristics. the feature points in the representation space. The class
The appropriate selection of the training set directly de- separation matrix and a measure of separability can be
termines the validity of the final results. The database computed as
should contain several examples of face images for each
subject in the training set and at least one example in the S ~ V ! 5 S 21
w Sb (4)
test set. These examples should represent different fron-
tal views of subjects with minor variations in view angle. J V 5 sep~ V ! 5 trace~ S ~ V ! ! (5)
They should also include different facial expressions, dif- J V is our measure of the discrimination power (DP) of a
ferent lighting and background conditions, and examples given representation V. As mentioned above, the repre-
with and without glasses. It is assumed that all images sentation may correspond to the data in its original form
are already normalized to m 3 n arrays and that they (e.g., a gray-scale image), or it can be based on a set of ab-
contain only the face regions and not much of the subjects’ stract features computed for a specific task.
bodies. For example, through this analysis we are able to com-
Second, for each image and subimage, starting with the pare the DP’s of different spatial segments (components)
two-dimensional m 3 n array of intensity values of a face. We can apply the analysis to segments of the
I(x, y), we construct the lexicographic vector expansion face images such as the areas around the eyes, mouth,
f P R m3n . This vector corresponds to the initial repre- hair, and chin or combinations of them. Figure 1 shows a
sentation of the face. Thus the set of all faces in the fea- separation analysis for horizontal segments of the face
ture space is treated as a high-dimensional vector space. images in the database. The results show that the DP’s
Third, by defining all instances of the same person’s of all segments are comparable and that the area between
face as being in one class and the faces of different sub- the nose and the mouth has more identification informa-
jects as being in different classes for all subjects in the tion than other parts. Figure 2 shows that the DP of the
training set, we establish a framework for performing a whole image is significantly larger than the DP’s of its
cluster separation analysis in the feature space. Also, parts.
having labeled all instances in the training set and hav- Using wavelet transforms28–30 as multiscale orthogonal
ing defined all the classes, we compute the within- and representations of face images, we can also perform a
between-class scatter matrices as follows: comparative analysis of the DP’s of subimages in the
L wavelet domain. Different components of a wavelet de-
S ~wV ! 5 ( Pr~ C ! S ,
i51
i i (1) composition capture different visual aspects of a gray-
scale image. As Fig. 3 shows, at each level of decompo-
L
sition there are four orthogonal subimages corresponding
( Pr~ C !~ m 2 m !~ m 2 m !
to
S ~bV ! 5 i i i
T
. (2)
i51
• LL: the smoothed, low-frequency variations.
Here S w is the within-class scatter matrix showing the • LH: sharp changes in the horizontal direction, i.e.,
average scatter ( i of the sample vectors (V) of different vertical edges.
classes C i around their respective mean, vectors m i : • HL: sharp changes in the vertical direction, i.e.,
horizontal edges.
S i 5 E @~ V 2 m i ! 3 ~ V 2 m i ! T u C 5 C i # . (3) • HH: sharp changes in nonhorizontal, nonvertical
directions, i.e., other edges.
Similarly, S b is the between-class scatter matrix, repre-
senting the scatter of the conditional mean, vectors ( m i ) We applied the LDA to each subimage of the wavelet
around the overall mean vector m. PrC i is the probabil- transform (WT) of the face and estimated the DP of each
Fig. 1. Variation of the discrimination power of horizontal segments of the face defined by a window of fixed height sliding from top to
bottom of the image.
K. Etemad and R. Chellappa Vol. 14, No. 8 / August 1997 / J. Opt. Soc. Am. A 1727
Fig. 2. Variation of the discrimination power of a horizontal segment of the face that grows in height from top to bottom of the image.
priate and logical alternative to PCA or any other only for the selected largest eigenvectors.32 Note that
compression-based system that tries to find the most com- the dimensionality m of the resulting set of feature vec-
pact representation of face images. Concurrently but in- tors is m , rank(S) 5 min (n, NS 2 1). Now define
dependently of our studies, LDA has been used by Swets
and Weng32,33 to discriminate human faces from other ob- L ~ m ! 5 $ l i , i 5 1, ..., m , N S 2 1 % , (12)
jects.
To capture the inherent symmetry of basic facial fea- U ~ m ! 5 $ u i , i 5 1, ..., m , N S 2 1 % , (13)
tures and the fact that a face can be identified from its
mirror image, we can use the mirror image of each ex- so that L (m)
and U (m)
represent the set of m largest ei-
ample as a source of information.17 Also, by adding noisy genvalues of S ( F ) and their corresponding eigenvectors.
but identifiable versions of given examples, we can ex- Considering U ( m ) as one of the possible linear transfor-
pand our training data and improve the robustness of the mations V from Rn to Rm , with m , n, one can show that
feature extraction against a small amount of noise in the
input. Therefore for each image in the database we in- V 5 $ U : X , Rn → U T X 5 Y , Rm , m , n % , (14)
clude its mirror image and one of its noisy versions, as
shown in Fig. 4. Let F denote the face database, i.e., U ~ m ! 5 argmin UPV $ J X 2 J UTX% , (15)
where J X 5 tr(S ) and J Y 5 tr(S ) are separabilities
(X) (Y)
This shows that the u i ’s and l i ’s are generalized eigenvec- C s 5 $ C i s ~ u ! : ;u P U ~ m ! , I 5 1, ..., N S % . (17)
tors of $ S b , S w % . From this equation the l i ’s can be com-
puted as the roots of the characteristic polynomial Although all images of each subject are considered in
the process of training, only one of them needs to be
saved, as a template for testing. If a view-based ap-
u S b 2 l i S w u 5 0, (10)
proach is taken, one example for each distinct view has to
and then the u i ’s can be obtained by solving be stored. Since only the projection coefficients need to
be saved, for each subject we retain the example that is
closest to the mean of the corresponding cluster in the
~ S b 2 l iS w !u i 5 0 (11)
feature space. Storing the projection coefficients instead
of the actual images is highly desirable when large data-
bases are used. Also, applying this holistic LDA to mul-
tiscale representations of face images, one can obtain
multiscale discriminant eigentemplates. For example,
one can apply LDA to each component of the WT of face
images and select the most discriminant eigentemplates
obtained from various scales. This approach is more
complex because it requires the WT computation of each
Fig. 4. For each example in the database we add its mirror im- test example, but in some applications it may be useful,
age and a noisy version. for example, when the DP of the original representation is
K. Etemad and R. Chellappa Vol. 14, No. 8 / August 1997 / J. Opt. Soc. Am. A 1729
not captured in the first few eigenvectors or when the con- Figure 5 shows the distribution of projection coeffi-
dition of m , N classes 2 1) becomes restrictive, e.g., in cients onto three discriminant vectors. For any test vec-
gender classification. torized face image f, we project the image onto each of
the top discriminant vectors u. On the basis of the dis-
4. MULTISOURCE SOFT-DECISION tances between the resulting coefficients f (u) and those
INTEGRATION of the existing templates c u s stored in the database, we
estimate the level of similarity of the input image to each
A number of different approaches have been proposed for
known subject (see Fig. 6):
analyzing information obtained from several sources.34–36
The simplest method is to form an extended data (feature) ;u P U ~ m ! : f ~ u ! 5 ^ f , u & , (19)
vector, containing information from all the sources and
treat this vector as the vector output of a single source. ;s P S̃ : d u ~ f , s ! 5 u f ~ u ! 2 c u s u , (20)
Usually, in such systems all similarities and distances are du ~ f , s !
measured in the Euclidean sense. This approach can be p u~ f , s ! 5 1 2 , (21)
computationally expensive; it is successful only when all
the sources have similar statistical characteristics and
( ˜
sPS
du ~ f , s !
Table 1. Summary of Recognition Rates The first test was on gender classification with use of a
subset of the database containing multiple frontal views
Recognition Recognition
of 20 males and 20 females of different races. LDA was
No. of No. of Rate (%) Rate (%)
applied to the data, and the most discriminant template
Task Examples Features (Training Set) (Test Set)
was extracted. Figure 7 shows this eigentemplate and
Face the distribution of projection coefficients for all images in
2000 4 100 99.2
recognition the set. As Fig. 7 shows, with only one feature very good
Gender separation can be achieved. Classification tests on a dis-
400 1 100 95
classification joint test set also gave 95% accuracy. Also, applying this
discriminant template to a set of new faces from individu-
als outside the training set reduced the accuracy
with a certain reliability and include it in the decision
to ;92%.
process. See Table 1 for a summary of recognition rates.
As mentioned above, one can also apply LDA to wavelet
transforms of face images and extract the most discrimi-
nant vectors of each transform component and combine
5. EXPERIMENTS AND RESULTS multiscale classification results by using the proposed
In our experiments, to satisfy the requirements men- method of soft-decision integration.
tioned in Section 2 we used a mixture of two databases. We then applied LDA to a database of 1500 faces, with
We started with the database provided by Olivetti Re- 60 classes corresponding to 60 individuals. Figure 8
search Ltd.37 This database contains 10 different images shows the discriminatory power of the top 40 eigenvectors
of each of 40 different subjects. All the images were chosen according to PCA and LDA. As Fig. 8 depicts, the
taken against a homogeneous background, and some were classification information of the principal components
taken at different times. The database includes frontal does not decrease monotonically with their energy; in
views of upright faces with slight changes in illumination, other words, there are many cases in which a low-energy
facial expression (open or closed eyes, smiling or nonsmil- component has a higher discriminatory power than a
ing), facial details (glasses or no glasses), and some side high-energy component. The figure also shows that the
movements. Originally we chose this database because top few discriminant vectors from LDA contain almost all
it contained many instances of frontal views for each sub- the classification information embedded in the original
ject. Then, to increase the size of the database, we added image space.
some hand-segmented face images from the FERRET Figure 9 shows the separation of clusters for ten poses
database.38 We also included mirror-image and noisy of four different individuals, obtained with use of the two
versions of each face example to expand the data set and most discriminatory eigenvectors or eigenpictures. As
improve the robustness of recognition performance to im- Figure 9 indicates, the differences among classes (indi-
age distortions. The total number of images used in viduals) are emphasized, while the variations of the same
training and in testing were approximately 1500 and 500, face in different poses are deemphasized. The separation
respectively. Each face was represented by a 50 3 60 is achieved despite all the image variations resulting from
pixel 8-bit gray-level image, which for our experiments the various poses of each subject. Figure 10 shows the
was reduced to 25 3 30. The database was divided into distribution of clusters for 200 images of 10 subjects in
two disjoint training and test sets. Using this composite the best two-dimensional discriminant feature space and
database, we performed several tests on gender classifica- in the best two-dimensional PCA-based space.
tion and face recognition. For each test face example, we first projected it onto the
Fig. 7. Distribution of feature points for male and female examples in the database.
K. Etemad and R. Chellappa Vol. 14, No. 8 / August 1997 / J. Opt. Soc. Am. A 1731
Fig. 8. Comparison of DP’s of the top 40 selected eigenvectors based on PCA and LDA.
Fig. 9. Separation of clusters in the selected two-dimensional feature space. Four clusters correspond to variations of the faces of four
different subjects in the database.
selected eigenvectors and found the distance from the cor- these completely new faces resulted in a reduction in clas-
responding point in the four-dimensional feature space to sification accuracy to ;90%.
all of the previously saved instances. All distances were This reduction was expected, considering the fact that
measured according to Eq. (23), and the best match was we did not have a very large training set. Extracting dis-
selected. For the given database, excellent (i.e., 99.2%) criminant facial features from a large training set with di-
accuracy was achieved on the test set. verse examples should improve the generalization and
To evaluate the generalization of the feature extraction performance of the system on recognition of subjects out-
beyond the original training and test sets, we tested the side the training set.
classification results on pictures of new individuals, none The simplicity of our systems, the size of the database,
of whom was present in the training set. Because of our and the robustness of the results to small variations of
limitations in terms of data availability, we could use only the pose and the noise show that our suggested scheme is
ten new subjects with two pictures per subject: one a good alternative approach to face recognition. It pro-
saved in the database as a template and two for testing. vides highly competitive results at much lower complexity
As expected, the application of the projection templates to with the use of low-dimensional feature sizes.
1732 J. Opt. Soc. Am. A / Vol. 14, No. 8 / August 1997 K. Etemad and R. Chellappa
Fig. 10. Cluster separation in the best two-dimensional feature space. Top, based on LDA; bottom, based on PCA.
age Understanding Workshop (Institute of Electrical and 25. A. Rahardja, A. Sowmya, and W. Wilson, ‘‘A neural network
Electronics Engineers, Piscataway, N.J., 1994), pp. 843– approach to component versus holistic recognition of facial
850. expressions in images,’’ in Intelligent Robots and Computer
13. C. Wu and J. Huang, ‘‘Huang face profile recognition by Vision X: Algorithms and Techniques, D. P. Casasent, ed.,
computer,’’ Pattern Recog. 23, 255–259 (1990). Proc. SPIE 1607, 62–70 (1991).
14. G. Gordon, ‘‘Face recognition based on depth maps and sur- 26. A. Yuille, D. Cohen, and P. Hallinan, ‘‘Feature extraction
face curvature,’’ in Geometric Methods in Computer Vision, from faces using deformable templates,’’ in Proceedings of
B. C. Vemuri, ed., Proc. SPIE 1570, 234–247 (1991). the IEEE Computer Society Conference on Computer Vision
15. O. Nakamura, S. Mathur, and T. Minami, ‘‘Identification of and Pattern Recognition (IEEE Computer Society, Los
human faces based on isodensity maps,’’ Pattern Recog. 24, Alamitos, Calif., 1989), pp. 104–109.
263–272 (1991). 27. D. Marr, Vision (Freeman, San Francisco, Calif., 1982).
16. Y. Cheng, K. Liu, J. Yang, and H. Wang, ‘‘A robust alge- 28. R. R. Coifman and M. V. Wickerhauser, ‘‘Entropy based al-
braic method for human face recognition,’’ in Proceedings of gorithms for best basis selection,’’ IEEE Trans. Inf. Theory
the 11th International Conference on Pattern Recognition 38, 713–718 (1992).
(IEEE Computer Society Press, Los Alamitos, Calif., 1992), 29. S. G. Mallat, ‘‘A theory for multi-resolution signal decompo-
pp. 221–224. sition, the wavelet representation,’’ in IEEE Trans. Pat-
17. M. Kirby and L. Sirovich, ‘‘Application of the Karhunen– tern. Anal. Mach. Intell. 11, 674–693 (1989).
Loève procedure for the characterization of human faces,’’ 30. I. Daubechies, ‘‘Orthonormal basis of compactly supported
IEEE Trans. Pattern. Anal. Mach. Intell. 12, 103–108 wavelets,’’ Commun. Pure Appl. Math. 41, 909–996 (1988).
(1990). 31. A. O’Toole, H. Abdi, K. Deffenbacher, and D. Valentin,
18. M. A. Turk and A. P. Pentland, ‘‘Face recognition using ‘‘Low-dimensional representation of faces in higher dimen-
eigenfaces,’’ in Proceedings of the IEEE Computer Society sions of the face space,’’ J. Opt. Soc. Am. A 10, 405–410
Conference on Computer Vision and Pattern Recognition (1993).
(IEEE Computer Society, Los Alamitos, Calif., 1991), pp. 32. D. L. Swets and J. J. Weng, ‘‘SHOSLIF-O: SHOSLIF for object
586–591. recognition (phase I),’’ in Tech. Rep. CPS 94-64 (Michigan
19. A. Pentland, B. Moghaddam, T. Starner, and M. Turk, State University, East Lansing, Mich., 1994).
‘‘View-based and modular eigenspaces for face recognition,’’ 33. D. L. Swets, B. Punch, and J. J. Weng, ‘‘Genetic algorithms
in Proceedings of the IEEE Computer Society Conference on for object recognition in a complex scene,’’ in Proceedings of
Computer Vision and Pattern Recognition (IEEE Computer the IEEE International Conference on Image Processing (In-
Society, Los Alamitos, Calif., 1994), pp. 84–91. stitute of Electrical and Electronics Engineers, Piscataway,
20. L. Sirovich and M. Kirby, ‘‘Low-dimensional procedure for N.J., 1995), pp. 595–598.
the characterization of the human face,’’ J. Opt. Soc. Am. A 34. K. Fukunaga, ‘‘Statistical Pattern Recognition’’ (Academic,
4, 519–524 (1987). New York, 1989).
21. I. Craw, D. Tock, and A. Bennett, ‘‘Finding face features,’’ 35. T. Lee, J. A. Richards, and P. H. Swain, ‘‘Probabilistic and
in Proceedings of the Second European Conference on Com- evidential approaches for multisource data analysis,’’ IEEE
puter Vision (Springer-Verlag, Berlin, 1992), pp. 92–96. Trans. Geosci. Remote Sens. GE-25, 283–293 (1987).
22. B. S. Manjunath, R. Chellappa, and C. v. d. Malsburg, ‘‘A 36. P. L. Bogler ‘‘Shafer–Dempster reasoning with applications
feature based approach to face recognition,’’ in Proceedings to multisensor target identification systems,’’ IEEE Trans.
of the IEEE Computer Society Conference on Computer Vi- Syst. Man Cybern. 17, 968–977 (1987).
sion and Pattern Recognition (IEEE Computer Society, Los 37. F. Samaria and A. Harter, ‘‘Parameterization of a stochas-
Alamitos, Calif., 1992), pp. 373–378. tic model for human face identification,’’ in Second IEEE
23. M. Lades, J. Vorbruggen, J. Buhmann, J. Lange, C.v.d. Workshop on Applications of Computer Vision (Institute of
Malsburg, and R. Wurtz, ‘‘Distortion invariant object recog- Electrical and Electronics Engineers, Piscataway, N.J.,
nition in the dynamic link architecture,’’ IEEE Trans. Com- 1994).
put. 42, 300–311 (1993). 38. P. Rauss, P. J. Phillips, M. Hamilton, and A. T. DePersia,
24. M. Seibert and A. Waxman, ‘‘Recognizing faces from their ‘‘FERET (Face Recognition Technology) Program,’’ in 25th
parts,’’ in Sensor Fusion IV: Control Paradigms and Data AIPR Workshop: Emerging Applications of Computer Vi-
Structures, P. S. Schenker, ed., Proc. SPIE 1616, 129–140 sion, D. Schaefer and W. Williams, eds., Proc. SPIE 2962,
(1991). 253–263 (1996).