Hybrid Sensing Face Detection and Recognition

Hybrid Sensing Face Detection and Recognition
Mingyuan Zhou1 , Haiting Lin1 , Jingyi Yu1 , S. Susan Young2

1 University of Delaware, Newark, DE 19716, USA
2 US Army Research Laboratory, 2800 Powder Mill Rd, Adelphi, MD 20783, USA
AbstractThe capability to track, detect, and identify human target for eye localization. The bright-eye effect is similar as
targets in highly cluttered scenes under extreme conditions, such the red-eye effect in the photography field. Retina has high
as in complete darkness or in battlefield, has been one of the reflectivity in the red and near-infrared (NIR) regions of the
primary tactical advantages in military operations. In this paper,
we propose a new collaborative, multi-spectrum sensing solution spectrum. Therefore, when using a flash light, the eyes appear
to achieve face detection and registration under low lighting to be red in a visible image. When a near IR flash and a near
conditions. We construct a novel type of hybrid sensors by IR camera are used, eyes appear to be bright in a near IR
combining a pair of near infrared (NIR) cameras and a thermal image. Next, we use the acquired bright-eyes from a pair of
camera (a long wave infrared LWIR camera). We strategically NIR cameras to determine the face pose and geometry. Since
surround each NIR sensor with a ring of LED IR flashes in
order to capture the red-eye, or more precisely, the bright- the correspondences of the eyes are known, we can efficiently
eye effect of the target. The bright-eyes are used to localize compute the 3D eye locations via triangulation. This reduces
the 3D position of eyes and face. The recovered 3D information the dimension of the 5D pose estimation problem (3D location
can be further used to warp the thermal face imagery to frontal- and 2D orientation) into one dimension (1D rotation angle
parallel pose so that additional tasks such as face recognition can around the axis passing through the eye locations). We further
be reliably conducted, especially with the assistance of accurate
eye locations. resolve the one dimension ambiguity using three additional
face landmarks (one nose tip and two mouth corners) detected
I. I NTRODUCTION on thermal image based on a CNN model similar to [18].
While we could also detect eye landmarks on thermal face
There is a need for detecting and recognizing human at image directly, our hybrid solution has two advantages over the
a low light condition in many domains, such as military and direct detection. First, we only conduct the landmark detection
industrial domains. For example, in covert military operations, inside a face bounding box which includes the eyes. The effi-
it is critical to confirm a personal targets identity before cient bright eye localization dramatically reduces the searching
action. In the past decades, tremendous advances have been space of the detection. Second, the eyes on thermal images are
made on reliable face detection and recognition in the visible lack discriminating feature points which also barriers accurate
spectrum. However, two fundamental challenges still remain. manual marking in ground truth generation for learning based
First, a large number of tasks need to be conducted at a low detection methods. Our solution precisely detects the nearly
light condition, even in complete darkness, where the acquired true 3D eye locations dynamically.
visible face images are severely corrupted by noise. Reliable With five thermal face landmarks, we estimate the head pose
face detection/matching is difficult for such noisy data. Fur- and use this 3D geometry information to warp the thermal face
thermore, nearly all existing algorithms and databases assume image back to the one with a frontal-parallel pose. This solves
the acquired face pose is frontal parallel. In reality, face images pose variation in face recognition. Experiments show that our
captured in uncooperative conditions generally exhibit strong new class of 2D/3D computer vision techniques is robust for
3D orientations. The inputs need to be accurately rectified face detection, pose estimation, face registration and further
before conducting face detection and recognition. face recognition under low lighting conditions.
In this paper, we propose a new collaborative, multi- This paper is organized as follows: we review the most
spectrum sensing solution to achieve face detection and regis- related work in section II; section III describes our hybrid
tration under extremely low lighting conditions. By leveraging solution; we propose our prototype and show the experimental
computational multi-spectrum imaging and multimodal image results in section IV; Section V concludes this paper and
analysis, our proposed solution will be robust for single discusses limitations of our method and the further work.
or multiple face detecting and face registration under poor
lighting conditions such as in complete darkness. II. R ELATED W ORK
We construct a new type of hybrid sensing system, which Human face detection/recognition is a critical area with
consists of one stereo pair of near infrared (NIR) cameras and a wide range of applications nowadays. Face detection and
a long wave IR (LWIR) thermal camera. Each NIR sensor is recognition in visible spectrum has made many advances in
surrounded by a ring of LED IR flashes to strategically capture the past decades. However, several serious challenges still
the red-eye, or more precisely, the bright-eye effect of the remain in practical applications, such as variations in facial
978-1-4673-9558-8/15/$31.00 2015
c IEEE expression, pose, and illumination etc. Many published work
Bright Eye localization from
First frames without Second frames with Using 3D coordinates from our 3D model
the difference images
bright eyes: bright eyes: corresponding to 2D coordinates, we can
3D eyes locations
compute pose of the head Rp, Tp to the
Views from Image camera.
NIR camera_1: subtraction
Near Infrared
Camera Pair (Stereo)
Project eyes to
thermal image
Views from Image
Warp face to
NIR camera_2 NIR camera_2:
frontal view
NIR camera_1 subtraction
Pose Estimation
Thermal camera
landmarks: landmarks:
(LWIR camera) Fronto-paralel
Reprojected Previous
Face with Face landmarks Full landmarks
Captured Result:
bounding detection
Thermal Image box
Tone Face Landmark Landmarks
Mapping detection detection Fusion
Fig. 1: Pipeline of our hybrid sensing system for face detection and pose standardization.
attempts to overcome these challenges by developing more eyes for pupil detection and tracking. Their system efficiently
sophisticated models. For example, illumination can be cor- localizes pupils and improves robustness and accuracy of pupil
rected by statistical facial models [21] and image filters [15]; tracking under strong external illumination interference.
extracting stable face features, e.g. [3, 17, 20], can deal with Convolutional neural network (CNN) has been widely
facial expression under pose variation in face recognition. and successfully used in vision tasks, such as face detec-
Among these various approaches, the use of LWIR im- tion/recognition [6, 12], pose estimation [16], image classifi-
agery gives another promising research direction, for which cation [5, 11] etc. Yi Sun [18] proposed a deep convolutional
it is much easier to overcome some of the above mentioned network cascade for facial point detection in visible spectrum,
challenges using the unique properties of the LWIR facial its three-level cascaded structure makes this deep model pow-
imagery. The infrared spectrum can be divided into four sub- erful and robust and significantly improves accuracy on the
bands customarily [14]: near IR (NIR; wavelength 0.75 - 1.4 facial landmarks detection.
m), short wave IR (SWIR; wavelength 1.4 - 3 m), medium Most previous methods for face frontalization typically esti-
wave IR (MWIR; wavelength 3 - 8 m), and long wave IR mate the 3D surface of the face in the image by reconstructing.
(LWIR; wavelength 8 - 15 m). The idealized spectrum of Methods in [1, 2] attempt to learn the facial geometries by
heat emission by human body is in LWIR sub-band. This heat aligning 3D face models. Deep-learning is also used in [23]
signal is unaffected by the visible spectrum and hence making for estimating canonical views of faces. Hassner et. al. in [8]
the imaging invariant to the lighting condition. In contrast proposes a single 3D reference surface to produce frontal face
to visible spectrum imaging, the LWIR cameras can capture view for all query images.
nearly noise-free face images even under complete darkness.
Studies [7] also show that LWIR facial imagery is more robust III. H YBRID SENSING SYSTEM
to expression change, aging etc. However, pose variation in
In this section, we introduce our hybrid multi-spectrum and
LWIR face recognition is still challenging. We propose an
multi-viewpoint sensing system for the accurate face detection
hybrid sensing system to estimate 3D head pose and solve the
and pose standardization. Fig. 1 shows the overall pipeline of
pose variation by transforming the thermal face image back to
our system. The system consists of a LWIR camera and a
the one with a standard frontal-parallel position.
pair of NIR stereo cameras attached at each side of the LWIR
Many face detection/recognition algorithms in visible spec- camera. The NIR cameras captures two stereo image pairs
trum usually rely on multiple facial landmarks to achieve with and without bright-eye effect. respectively, for accurate
good performances, where eye locations play an important and efficient eye localization. The LWIR camera captures
role [4, 22]. However, those techniques can not be directly thermal images for face detection and recognition. With the
transferred to solve the eye localization problem in the thermal assistance of the 3D eye localization from the NIR cameras,
images, due to the lack of discriminating feature points around we efficiently extract the facial region and feed the region into
eyes and possible occlusions from glasses. We propose to trained face and landmark detectors. We fuse the landmarks,
design an NIR stereo image pair to accurately and efficiently 5 points in total, from NIR and LWIR cameras together and
compute eye locations based on the bright-eye effect. estimate the pose by minimizing the projection error of a
The bright-eye effect has been explored in eye localization. 3D head model projected to the thermal image. Finally, the
Ji Qiang et. al. [10] proposed an image acquisition system with recovered 3D geometry information is used to warp the face
strategically equipped NIR illuminators which captures bright- to the standard frontal-parallel position. We elaborate each
Human eye
Eye-lit IR light source pupil
Near IR
camera
Face-lit IR light source

Human eye
pupil
Near IR
camera
Fig. 2: The principle of bright-eye effect. When the NIR

illuminator is on the optical axis, called eye-lit IR light source,
bright eyes are captured. When the NIR illuminator is off the
optical axis, called face-lit IR light source, no bright eyes are
observed in the image. Fig. 3: Bright-eye effects of different poses. The bright-eye
effect is insensitive to pose variations under low lighting
conditions.
component and sub-system of our system in the following
subsections.
A. NIR bright eye localization subsystem of the NIR camera, the amount of IR light transmitting onto
and then reflected by the retina is sufficient for producing the
This subsystem exploits bright-eye effects to efficiently and bright-eye effect.
reliably detect the 2D eye positions on the NIR images,
In order to use the distinctive property of eyes in eye
and directly establishes the correspondences between the eye
localization, we capture two sequential frames from each NIR
projections of the image pair. This is different from traditional
camera with and without the bright-eye effect, respectively.
depth estimation method which suffers from correspondence
The first frame is the face lit only image without bright eyes
ambiguities. With known correspondences, triangulation can
using off-axis illuminators. The second frame is the face lit
be conducted trivially to recover the 3D eye locations.
image with bright eyes using on-axis illuminators. We adopt
Certain conditions need to be satisfied in order to trigger simple global thresholding on the difference image between
the bright-eye effect. According to [9], the bright eyes can the two frames to find the 2D eye locations.
be imaged only if the NIR illuminator is beaming along the
optical axis of the NIR camera. In such setting, much of the
light from the co-axis NIR illuminator passes into the eye
through the pupil, reflects off the fundus at the back of the B. 3D eye localization and projection on the LWIR image
eyeball and out through the pupil back to the image sensor.
This produces the bright-eye effect similar to a phenomenon After the detection of the eyes in the NIR stereo image pair,
in photography known as red-eye effect. The bright-eye effect we recover the 3D locations of the eyes through triangulation
disappears when the NIR illuminator is off the cameras optical and project the 3D locations back to the LWIR image as the
axis, because the reflected IR light cannot enter into the eye landmarks on the thermal face image. We obtain camera
camera. Fig 2 illustrates the principles behind these effects. distortion parameters and the relative poses between different
Another condition for the obvious bright-eye effect is that camera coordinate systems through the calibration process.
the IR light should be effectively reflective. Behind the retina, The 3D eye locations can be triangulated through solving
there is ample amount of blood in the choroid which nourishes a linear system. As illustrated in Fig. 4, we use the first NIR
the back of the eye. The blood is completely transparent at camera coordinate system as the world coordinate system, and
long wavelength and abruptly starts absorbing at 600nm [19]. denote the projection matrices of the two NIR cameras as
Therefore we use the commodity IR flashes with wavelengths P1 = K1 [R1 |T1 ] and P2 = K2 [R2 |T2 ] respectively, where the
around 800nm to effectively acquire the bright-eye effects Ki , Ri and Ti (i = 1, 2) are the intrinsic, extrinsic rotation,
using our NIR cameras. and translation matrices, respectively. The projection matrix
The low lighting condition also strengthens the bright-eye of the LWIR camera is denoted as Pt = Kt [Rt |Tt ] similarly.
effect. This is because, in the dark, the pupils are fully dilated Let one unknown 3D eye coordinate be (X, Y, Z) and
and the IR light is not absorbed by any ocular pigment. The corresponding known homogeneous coordinates on stereo NIR
bright-eye effect is also insensitive to pose variations as shown images be (u1 , v1 ) and (u2 , v2 ), we have the following rela-
in Fig. 3. Even if the subject is not facing the optical axis tions:
By solving Eq. 3, we obtain the unknown 3D eye coordinates
from the following least square solution:

Right Eye X
Y = (J T J) \ (J T b) (4)
, , Z
Left Eye
After obtaining the 3D eye location, we can project it onto
, , the LWIR image based on calibrated camera parameters. The
projected homogeneous coordinates of eye on the thermal
image ut , vt can be computed as

X
ut Y
st vt = Pt
Z
(5)
1
1
where st is a scalar. We denote the eye landmarks of the
thermal face as (ult , vtl ) and (urt , vtr ) for left and right eyes,
respectively.
C. Additional landmarks detection

, In order to robustly estimate the pose angles of the thermal
image, we need to locate more face landmarks. Besides the
two eye landmarks (LE and RE) on the thermal image, we
, detect additional three landmarks, they are one nose tip (N),
and left and right mouth corners (LM and RM). We resort to
Fig. 4: Camera pose estimation and 3D eye localization. a deep cascaded convolutional neural network for landmark
detection on thermal face images.
With the eye landmarks identified, the potential face regions
can be easily extracted. We make the region sufficiently big to
X cover the whole face. A simple cascade thermal face detector,
ui Y
si vi = Pi trained on our thermal face data set, is applied to the potential
Z , i = 1, 2 (1)

1 region to further tighten the face bounding box.
1 For the additional thermal face landmark detector, we mod-
where s1 , s2 are unknown scalars. ify Suns deep cascaded convolutional neural network (CNN)
The 3 4 projection matrices P1 , P2 for each NIR camera [18] to account for the change of the detection target. Fig. 5
can be obtained through the camera calibration and can be shows an overview of our modified approach. Due to the weak
expressed as: discriminative power of eyes in LWIR images, we eliminate
i
p11 pi12 pi13 pi14
the part of eye detection in Suns model to improve model
pi21 pi22 pi23 pi24 efficiency and accuracy for thermal face images. Therefore
Pi = pi31 pi32 pi33 pi34 , i = 1, 2
(2) only three additional facial landmarks need to be detected (N,
pi41 pi42 pi43 pi44 LM, and RM).
The overall network is cascaded from three levels of CNNs.
By combining 1 and 2, and eliminating unknown s1 , s2 , we
We reuse the individual neural network structure of each
derive a linear system:
landmark detector from [18]. At the first level, three deep
convolutional networks, N1, F1, and NM1, are employed with
p111 u1 p131 p112 u1 p132 p113 u1 p133

same layer arrangements but different filtering sizes, inputs
p21 1
21 v1 p31 p122 v1 p132 p123 v1 p133
J = and targets. The corresponding input regions cover nose, whole
p211 u2 p231 p212 u2 p232 p213 u2 p233 face, nose and mouth respectively. The network at first level
p221 v2 p231 p222 v2 p232 p2 v2 p233 is deep, which contains four convolutional layers followed by
23
u1 p134 p114

max pooling, and two fully connected layers. Fig. 6 shows the
v1 p134 p124 example of F1. The filtering sizes of N1 and NM1 are different
b=
u2 p234 p214

from F1 same as in [18]. Using this deep model which is able
v2 p2 p2 to capture high-level features from larger regions, the first level
34 24
X robustly detects the landmark positions with few large errors.
J Y =b (3) The positions are further refined in the second and third levels.
Z
Level 1 Level 2 Level 3
CNN SN1 CNN TN1
Deep CNN N CNN SN2 CNN TN2
Deep CNN F
Deep CNN NM CNN SRM1 CNN TRM1 Output

Face
bounding box landmarks
CNN SRM2 CNN TRM2
Fig. 5: Cascade convolutional neural network for thermal face landmark detection based on [18].
Deep CNN F1
39 36
18
4 2 3
4 2 3
39 36 18 10
20 120 Fully
Max Pooling Another three layers of Convolution Connected
20 Fully
1 Convolution and two layers of Max Pooling Connected
Input
Fig. 6: The structure of deep CNN F1 [18]. The length, width

and height of the cuboids denote the number of maps and the
size of each map. The small squares in the cuboids illustrate
the local receptive fields of neurons
Fig. 7: The 3D head model and the corresponding landmarks
(white points) we used in pose estimation.
The second and third levels involve shallower networks
compared with first levels to improve accuracy by using
smaller local regions as input. We adopt the same structure intrinsic and distortion parameters, and the corresponding
from [18] which contains two convolutional layers followed standard 3D points in the world coordinate system, we can
by max pooling, and two fully connected layers. These two estimate the head model pose relative to the camera by solving
levels learn local position adjustments to refine the previous a perspective-n-point (PnP) problem. We use the method
predicted positions. Due to the ambiguity and unreliability introduced by F.Moreno-Noguer, V.Lepetit and P.Fua [13] to
of local information at last two levels, the adjustment to the solve this PnP problem by minimizing the projection error:
previous predicted position of each landmark is constrained to
w
X
2 pi
be small. res = dist (Kt [R|T ] , uti ) (6)
1
To further reduce the variance, each level has two copies of i
the CNNs for each landmark detection, eg. the SN1 and SN2 where pw t
i , ui , i = 1, 2, ..., 5 dist(m, n) are the 3D points and
in the second level are the nose refiners. The copies differ their corresponding distortion corrected camera projections,
in their input regions: they are differently shifted from the dist(u, v) computes the 2D distance between point u and
patch window centered at the previously predicted position. v, and Kt is the camera intrinsic matrix. The rotation and
The predicted positions of the level for each landmark are the translation matrices of the head pose R, T relative to the LWIR
average of the predictions from the copies. camera is estimated by minimizing Eq. 6.
From this cascade CNN, we achieve accurate detection of
E. Face frontalization
three additional facial landmarks.
We adopt an efficient homography based warping method
D. Pose estimation to do face frontalization. Given the head pose and the camera
In order to estimate the pose from the five facial landmarks, parameters, the five 3D facial points can be rotated facing
we first collect corresponding five standard 3D points from towards the camera and projected back to the image. The
a 3D human head model (fig. 7): left eye, right eye, nose process is shown in figure 8. The homography between the
tip, left and right mouth corners. Given the LWIR cameras detected facial landmarks and the corresponding standard
Indoor
Gamma
Correction&
Denoising
Outdoor
Fig. 8: The standard frontal-parallel projection of the facial

landmarks. Fig. 10: The working range of our prototype.
IR Flash Control System Face-lit IR Lights

SSR Relay Board(2.5A)
for Face-lit IR lights
Phidget
InterfaceKit 8/8/8
(a) Calibration mold. (b) LWIR image. (c) NIR image.

SSR Relay Board(2.5A)
for Eye-lit IR lights Eye-lit IR Lights Fig. 11: Cross modality camera calibration.
Fig. 9: Our hybrid sensing system prototype. frame that has bright eyes. The working range of this prototype
is shown in figure 10.
The second challenge is the LWIR calibration and the cross
projections is the warping matrix that corrects the face to modality camera calibration, i.e. the calibration between LWIR
the frontal view. This warping process performs the following and NIR cameras. In LWIR images, printed chessboard or
process: 1) texturing 2D face onto a rough 3D face model of other pattern is not visible due to its uniform temperature. In
five vertexes; 2) rendering the frontal view of the model. order to solve the problem, we designed a white pattern mold
(shown in figure 11a) with circular holes on the board. Keeping
IV. E XPERIMENTS the mold in normal temperature, we use a black exothermic
board behind the mold to produce a clear contrast between the
A. Hybrid Sensing Prototype Systerm
circular holes on the mold and the pattern mold in the LWIR
Fig. 9 shows our prototype of the hybrid sensing system. In images (shown in fig. 11b). After extracting centers of these
the center of the system is the Tamarisk 640 thermal camera. cycles in the pattern, we use off-the-shelf calibration code
We use two FLea2 gray camera with NIR filter removed as for the LWIR camera calibration. Due to the color contrast
our NIR cameras. Each NIR camera is surrounded with a ring between the black board and the white mold, it is easy to
of IR LED lights to simulate the on-axis IR light source. We detect centers of cycles on captured images from Flea2 camera
also place two extra IR light sources above the cameras as the shown in fig. 11c. Therefore, using the special mold, we also
off-axis light source. solves the cross modality calibration problem.
We have considered two challenges in our hybrid system
design. The first challenge is the synchronization of the IR B. Data set collection
illuminators, the pair of NIR cameras and the LWIR camera We acquired our training datasets using our hybrid sensing
for images acquisition. The synchronization between our pair system. Fig. 12 shows some examples of the data set. There
of NIR cameras is easily solved through the camera operation are 10 different faces in the dataset. Face orientations are
APIs provided by Point Grey for synchronization. However, quantized into eleven angles: 75 , 60 , 45 , 30 , 15 ,
it is more challenge to synchronize the on- and off-axis IR 0 , 15 , 30 , 45 , 60 , and 75 horizontally, and seven angles:
light sources with the camera capturing. For this purpose, we 60 , 40 , 20 , 0 , 20 , 40 , and 60 vertically. For each
design an IR flash control system (shown on the left of fig. person, we have 87 pose data: 77 appointed poses and 10
9) consisted of one Phidget Interfacekit board and two relay random poses.
boards. The control system is programmed to first turn on off-
axis IR lights and simultaneously trigger the NIR cameras to C. Experimental Results
capture the first frame without bright eyes, and then to turn 1) Results of eye landmark detection: In this section, we
on the on-axis IR lights and trigger the camera for the second present the results of our eye landmark detection. Fig.13.
Fig. 14: Eyes region comparison within NIR image and
thermal image.
Fig. 12: Thermal face training data.
Fig. 15: Comparison between the NIR image and the thermal
image with effect of the eye glasses.
with horizontal rotation angles between 0 and 60 and

vertical rotation angles between 45 and 45 . The first row
shows the landmark detection results. The second row shows
the pose estimation results. The third row shows the pose
Fig. 13: Comparison between CNN eye detection [18] (yellow correction results. Fig.16b shows the results of face images
points) and our method (cyan points). with horizontal rotation angles between 60 and 0 and
vertical rotation angles between 45 and 45 . Fig.16c shows
the results of face images with eye glasses appearances.
shows an example of our eye landmark detection and its These results show that our method is able to robustly
comparison with that in [18]. Since our method is able to estimate the head poses.
locate the 3D eye position, it is more accurate than the learning
based method. The lack of information (low contrast) around V. D ISCUSSION AND C ONCLUSION
eyes in thermal image causes errors in eye detection in the In this paper, we present a new collaborative, multi-
learning based method. Fig 14 shows an example of a thermal spectrum sensing solution to achieve face detection and reg-
image. istration under low lighting conditions. Our method utilizes
2) Results of eye landmark detection with eye glasses ap- the special bright eye effect of human eyes to assist 3D eye
pearance: We also tested our eye landmark detection method localization. With the 3D eye position, we can efficiently
on the images with eye glasses appearance, examples are construct tight candidate face region proposals, and can reduce
shown in first row of Fig. 16c. This is another advantage of the solution space further into 1D. Experiments illustrate that
our method over the learning based method for eye detection our method is robust, accurate, and efficient in face detecting
on a single thermal image. This is because that our system and registering under low light conditions.
uses extra NIR images which are more informative in cases In the future, we could further optimize our hybrid sensing
such as glass wearing. From fig. 15 we can see eyeglasses system for face detection and recognition. First, we could
totally block the eyes on the thermal image. In contrast, they extend the NIR flashing working range. Second, we could
have little effect on NIR images which can be used for eye increase amount of training data for the landmark detection
localization. training. Third, we could use more than five non-coplanar
3) Results of thermal face pose correction: In this sec- landmarks in head pose estimation. Using more landmarks
tion, we present our results on conducting thermal face pose such as the boundary of chin will increase the pose estimation
correction on our own dataset. Fig.16 presents examples of accuracy. Last, we could use better warping techniques than
several scenarios. Fig.16a shows the results of face images the homography warper for face registration. For future work,
(a) Experimental results with horizontal rotation angles between 0 and 60 and vertical rotation angles between 45 and 45 .
(b) Experimental results with horizontal rotation angles between 60 and 0 and vertical rotation angles between 45 and 45 .
(c) Experimental results with eye glasses appearances.

Fig. 16: Experimental results. For each subfigure, the first row shows the landmark detection results, the second row shows
the pose estimation results, and the third row shows the pose corrected results.
we try to match face image projection to a single 3D reference network approach. Neural Networks, IEEE Transactions
facial surface model as in [8] or 3D deformable head model. on, 8(1):98113, 1997.
After registering the face projection onto the 3D model, we [13] Vincent Lepetit, Francesc Moreno-Noguer, and Pascal
can easily render the frontal-view face from a textured 3D Fua. Epnp: An accurate o (n) solution to the pnp problem.
model. International journal of computer vision, 81(2):155166,
2009.
ACKNOWLEDGMENT [14] Xavier Maldague. Theory and practice of infrared
technology for nondestructive testing. 2001.
This work was supported by the Army Research Office
[15] Masashi Nishiyama and Osamu Yamaguchi. Face recog-
under the grant W911NF14-1-0338.
nition using the classified appearance-based quotient im-
age. In Automatic Face and Gesture Recognition, 2006.
R EFERENCES
FGR 2006. 7th International Conference on, pages 6pp.
[1] Volker Blanz, Kristina Scherbaum, Thomas Vetter, and IEEE, 2006.
Hans-Peter Seidel. Exchanging faces in images. In [16] Margarita Osadchy, Yann Le Cun, and Matthew L Miller.
Computer Graphics Forum, volume 23, pages 669676. Synergistic face detection and pose estimation with
Wiley Online Library, 2004. energy-based models. The Journal of Machine Learning
[2] Volker Blanz and Thomas Vetter. A morphable model Research, 8:11971215, 2007.
for the synthesis of 3d faces. In Proceedings of the 26th [17] Bo-Gun Park, Kyoung-Mu Lee, and Sang-Uk Lee. Face
annual conference on Computer graphics and interactive recognition using face-arg matching. Pattern Analy-
techniques, pages 187194. ACM Press/Addison-Wesley sis and Machine Intelligence, IEEE Transactions on,
Publishing Co., 1999. 27(12):19821988, 2005.
[3] Alexander M Bronstein, Michael M Bronstein, and Ron [18] Yi Sun, Xiaogang Wang, and Xiaoou Tang. Deep con-
Kimmel. Expression-invariant 3d face recognition. In volutional network cascade for facial point detection. In
Audio-and Video-Based Biometric Person Authentication, Computer Vision and Pattern Recognition (CVPR), 2013
pages 6270. Springer, 2003. IEEE Conference on, pages 34763483. IEEE, 2013.
[4] Tim Chuk, Antoni B Chan, and Janet H Hsiao. Under- [19] Jan Van de Kraats and Dirk van Norren. Directional
standing eye movements in face recognition using hidden and nondirectional spectral reflection from the human
markov models. Journal of vision, 14(11):8, 2014. fovea. Journal of biomedical optics, 13(2):024010
[5] Dan Ciresan, Ueli Meier, and Jurgen Schmidhuber. 024010, 2008.
Multi-column deep neural networks for image classifi- [20] Laurenz Wiskott and Christoph Von Der Malsburg. Rec-
cation. In Computer Vision and Pattern Recognition ognizing faces by dynamic link matching. Neuroimage,
(CVPR), 2012 IEEE Conference on, pages 36423649. 4(3):S14S18, 1996.
IEEE, 2012. [21] Lior Wolf and Amnon Shashua. Learning over sets using
[6] Christophe Garcia and Manolis Delakis. Convolutional kernel principal angles. The Journal of Machine Learning
face finder: A neural architecture for fast and robust face Research, 4:913931, 2003.
detection. Pattern Analysis and Machine Intelligence, [22] Xiangxin Zhu and Deva Ramanan. Face detection, pose
IEEE Transactions on, 26(11):14081423, 2004. estimation, and landmark localization in the wild. In
[7] Reza Shoja Ghiass, Ognjen Arandjelovic, Abdelhakim Computer Vision and Pattern Recognition (CVPR), 2012
Bendada, and Xavier Maldague. Infrared face recog- IEEE Conference on, pages 28792886. IEEE, 2012.
nition: A comprehensive review of methodologies and [23] Zhenyao Zhu, Ping Luo, Xiaogang Wang, and Xiaoou
databases. Pattern Recognition, 47(9):28072824, 2014. Tang. Recover canonical-view faces in the wild with
[8] Tal Hassner, Shai Harel, Eran Paz, and Roee Enbar. deep neural networks. arXiv preprint arXiv:1404.3543,
Effective face frontalization in unconstrained images. 2014.
arXiv preprint arXiv:1411.7964, 2014.
[9] Thomas E Hutchinson. Eye movement detector with
improved calibration and speed, August 21 1990. US
Patent 4,950,069.
[10] Qiang Ji and Xiaojie Yang. Real-time eye, gaze, and face
pose tracking for monitoring driver vigilance. Real-Time
Imaging, 8(5):357377, 2002.
[11] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton.
Imagenet classification with deep convolutional neural
networks. In Advances in neural information processing
systems, pages 10971105, 2012.
[12] Steve Lawrence, C Lee Giles, Ah Chung Tsoi, and An-
drew D Back. Face recognition: A convolutional neural-

Hybrid Sensing Face Detection and Recognition

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Hybrid Sensing Face Detection and Recognition

Caricato da

Copyright:

Formati disponibili

Hybrid Sensing Face Detection and Recognition

Mingyuan Zhou1 , Haiting Lin1 , Jingyi Yu1 , S. Susan Young2

Face-lit IR light source

Fig. 2: The principle of bright-eye effect. When the NIR

C. Additional landmarks detection

Deep CNN N CNN SN2 CNN TN2

Deep CNN NM CNN SRM1 CNN TRM1 Output

Fig. 6: The structure of deep CNN F1 [18]. The length, width

Fig. 8: The standard frontal-parallel projection of the facial

IR Flash Control System Face-lit IR Lights

(a) Calibration mold. (b) LWIR image. (c) NIR image.

Fig. 12: Thermal face training data.

with horizontal rotation angles between 0 and 60 and

(c) Experimental results with eye glasses appearances.

Potrebbero piacerti anche