Sei sulla pagina 1di 8

2015 International Conference on Virtual Reality and Visualization

Real-time 2.5D Facial Cartoon Animation based on Pose and Expression Estimation

Zhifeng Xie, Dongjin Huang, Youdong


Jiajun Yu Ding, Shi Tang Lizhuang Ma
School of Computer Engineering and School of Film and Television Art &
Department of Computer
Science Technology
Science & Engineering
Shanghai University Shanghai University
Shanghai Jiao Tong University
Shanghai, China Shanghai, China
Shanghai, China
fish9118@gmail.com {zhifeng_xie,djhuang,ydding}@shu.edu.cn
ma-lz@cs.sjtu.edu.cn
tangshimail@163.com

AbstractIn this paper, a real-time facial animation system is model, which simulates 3D rotations and generate natural
presented, that captures live facial expressions and poses from cartoon animations in actually 2D plane, has become a new
user, and uses them to animate a synthetic 2.5D cartoon branch in avatar animation.
character(Avatar) in real-time. Compared to existing 3D In this paper, we propose an automatic real-time 2.5D
Avatar systems, the 2.5D cartoon models are easier to create facial cartoon animation system, which can work with
than full 3D model, and the cartoon-style drawings are more ordinary PC video camera. The overview of the proposed
popular among young people. Our system requires only an system is illustrated in Fig. 1. At the initialization step, one
ordinary PC camera, and the initialization is fully automatic. of the video frames captured by camera will be sent to the
We established a mapping from real-time facial motion to the
cartoon generator, which was developed in our previous
atomic motion parameters of the cartoon model by pose and
work. A static personalized 2D cartoon portrait will be
expression estimation. The animation is driven by these
parameters in the way of blendshape interpolation.
generated. Then a 2.5D cartoon model will be registered with
these cartoon-drawings as the textures. At the run-time,
Keywords-cartoon; facial animation; avatar; blendshape users head pose and facial expression will be estimated in
real-time. The result of estimation will transfer to the 2.5D
I. INTRODUCTION cartoon model in the form of motion parameters, which can
The avatar animation, which represents a person in animate the model with blendshape interpolation.
virtual world, has been an active research area of computer II. RELATED WORK
graphics in recent years. It is of great value both in research
and industry areas. In general, real-time facial animation system contains
Unfortunately, the face is very complex, and facial two parts, face tracking and expression transfer. Face
expressions are individual independent. It is a very tracking captures the motion of human face, and then transfer
challenging job to model face by 3D models in high the 3D information and expression to an avatar.
accuracy. However, users usually care more about interesting The research on face tracking has a long history. We can
and aesthetics than model accuracy in consuming-level divide face tracking methods into 3D shape tracking methods
applications. With the development of Anime and non- and 2D shape tracking method. The 3D shape tracking
photorealistic rendering(NPR), cartoon-style drawings and methods learn a 3D shape regressor to infer the 3D positions
animations are widely popular. Specifically, 2.5D cartoon of facial landmarks. The-state-of-art methods are Displaced

Video Frames 2D Cartoon Portrait 2.5D


Cartoon Model

Animation
Pose
Estimation
Facial Tracking

Expression
Estimation

Figure 1. Overview of the proposed system

978-1-4673-7673-0/15 $31.00 2015 IEEE 185


DOI 10.1109/ICVRV.2015.62
Dynamic Expression(DDE)[1] and Dynamic Expression
Model(DEM)[2], which are quite robust in real-time
tracking. The 2D shape tracking methods are more widely
used, such as Active Shape Model(ASM)[3], Active
Appearance Model (AAM)[4,5,6,7], Constrained Local
Model (CLM)[8,9,10], 3D morphable models[11] and
deformable model tting[12]. We can infer the 3D gaze by
some pose estimation methods after 2D shape tracking[13].
A lot of work has been done for expression classification
and regression[6,14,15,16,17]. Cohen I et.al[14] introduced
Gaussian Tree-Augmented Naive (TAN) Bayes classifiers
and hidden Markov models (HMMs) for facial expression
recognition from video sequences. Zalewski L et.al[6]
proposed the approach of 2D statistical models to classify
facial expressions.
Some work animates the avatar by points mapping.
However, high accuracy 3D landmarks are required for these
methods. The more popular method for facial animation is
blendshape interpolation[18,19], which has also been
implemented in many commercial softwares. Rivers et.al[20]
proposed a 2.5D cartoon model to simulate 3D rotations with
several views of 2D cartoon pictures, which supporting full
3D rotation. Our 2.5D model is similar with that.
Our work focus on building a 2.5D cartoon model driven
by 2D information. We use 2D shape model for tracking and
a pose estimator to infer 3D pose. Blendshape interpolation
method is applied to animate the avatar.
III. 2.5D FACE CARTOON
A. Cartoon Portrait Generation
In our previous work, we completed a personalized
cartoon generation system. The input can be a single frontal
face image, and the output will be a corresponding
personalized cartoon portrait.
The system mainly work with two parts. (a) (b)
A cartoon material database. The database consists
of many cartoon-drawings of face organ, including Figure 2. Some examples of the cartoon portrait generation (a)input photo
the contour of face, eyes, eyebrows, nose, mouth, (b)the automatically generated cartoon portrait
and hair.
An organ-match model. The model was trained by B. 2.5D Blendshape Model
manually labeling of 400 real face images. We A 3D modeling method is applied for building the 2.5D
extract a lot of features from the training images, and cartoon model. Each cartoon components get a triangulation
then build the map from image feature to cartoon step for deformation(as shown in Fig. 3). The difference is
materials. all the 2D cartoon drawings parallel to the same plane.
During run-time, the organ-match model finds the best
match cartoon drawings for each organ, and finally
composes a cartoon portrait. Fig. 2 shows some examples of
the cartoon portrait generation step. Pictures in (a) are the
frontal face images, and the corresponding cartoon portraits
are shown in (b).

Figure 3. The triangulation step of cartoon component

Unlike the real face with a lot of detail information,


cartoon is often composed of lines and color blocks. The

186
deformation on cartoon drawings is more natural than that on B. Pose Estimation
real faces, which makes it possible to simulate 3D Pose estimation is the process to get 3D rotations from
appearance by deformations on 2D cartoons. Fig. 4 shows 2D information, which is required for 2.5D animation. The
some examples of 3D-look cartoon with the same 2D cartoon human head is limited to 3 degrees of freedom (DOF) in
drawings that resulted by free-form-deformations(FFD) on pose, which can be characterized by pitch, roll, and yaw
these drawings. angles as pictured in Fig. 5.

Figure 5. The three degrees-of-freedom of a human head can be


represented by three angles of orientation pitch, roll, and yaw

We use a geometric-based method to estimate head pose,


which is simple yet efficient. We estimate the pose of face by
5 points (pupils and the mid point of mouth). We connect the
mid-point of eyes and mouth, and define Lf, Le, Lm and Ln as
shown in Fig.6.

Figure 4. Examples of 3D-look cartoon model with 2D cartoon drawings


(simulating 3D rotations in yaw angel with -30, 0 and 30)

IV. REAL-TIME 2.5D FACIAL ANIMATION


A. Facial Landmarks Tracking
We apply Active Shape Model(ASM) to our system for
tracking the 2D facial landmarks from video frames. ASM Figure 6. The sketch of the faces geometric configuration
captures the modes of variations of shapes which occur in a
class of objects. An ASM model trained with 88 landmarks The roll angle is easy to calculate:
is used in our system.

187
yeye _ l yeye _ r Where Emouth is the local expression at mouth, Eeye_l and
roll = arctan( ) Eeye_r are the local expressions at left and right eyes.
xeye _ l xeye _ r A smile is the most common facial expression that occurs
(1) in peoples daily life. We trained a regression model for
smile with SVM(Support Vector Machine). We can predict a
Where (xeye_l, yeye_l) is the position of the left eye, and continuous parameter Paramsmile ranges from 0 to 100 which
(xeye_r, yeye_r) is the position of the right eye. Then we represents the expression from normal to laugh by the
defines two ratios Rp and Ry as: regression model.
D. Cartoon Animation Generation
L We predefined several blendshapes for each cartoon
Rp = m (2)
Lf organ in the 2.5D cartoon model. For any specific shape, we
can calculate the shape by blending several key shapes. The
formulaic description is:
Ln
Ry = (3)
N
Le
B = B0 + wi Bi (5)
i =1
Because face is axisymmetric, pitch and yaw angels have
strong relationship with the ratio Rp and Ry. We can acquire
Where B0 is the base blendshape. Wi is the weight for
the relationship by training a lot of pose labeled pictures.
ith delta blendshape. Briefly speaking, the 2.5D cartoon
C. Expression Estimation model can be driven by a set of parameters. We can
In general, we can estimate ones expression by his or her calculate the location parameters (pitch, yaw, roll) and
mouth and eyes. It is believed that mouth and eyes are the expression parameters (Paramsmile) by pose estimation and
most expression-rich areas of the face. If we can catch the expression estimation in B and C sections. The blendshape
motion of these areas and transfer the motion to our cartoon weight can be mapped from these atomic motion parameters.
model, the animation will be vivid. In order to simplify our V. EXPERIMENTAL RESULTS
model, we can definition any facial expressions as E:
We have implemented the described approach in C++
and developed the demo system with QT creator. The test
E = Emouth + Eeye _ l + Eeye _ r (4) system runs on a quad-core Intel Core i5 (2.5GHz)CPU, with

Figure 7. The interface of the real-time 2.5D facial cartoon animation system

188
an ordinary web camera producing 640480 video frames at
a maximum of 30 fps. The initialization step takes about 1-2
seconds. The interface of the implemented system is pictured
in Fig. 7.
The real-time video frames captured by PC camera is
displayed at the left-top corner. The generated cartoon
portrait is showed at the left-bottom corner. The 2.5D carton
avatar animation rendering by OpenGL displayed at the
middle area. If we are not satisfied with the automatic
cartoon portrait, we can also change some cartoon organs
manually by clicking the cartoon drawings at the right area.
The performance of the system shows in Table I. The
facial tracking takes the most of the time. The system can run
at the average speed of 27fps.
The system can apply equally for men and women. Fig.8
shows some experiments on that.
The selected frames of experiment on the real-time facial
cartoon animation are shown in Fig.9
Fig.10 shows some conditions that the facial tracking
step fails or with great error, which mainly caused by the
complex lighting, background, and fast move of the face.
TABLE I. AVERAGE PROCESSING TIME OF THE SYSTEM

Average Processing Time

Facial
Expression Animation
Landmarks Pose estimation
estimation rendering
Tracking

24ms <1ms 8ms 5ms

(a) (b)
Figure 8. Animation results on female and male users with normal and
smile expression (a) video frames captured by an ordinary web camera (b)
the real-time facial cartoon animation

189
Frame 100

Frame 148

Frame 236

Frame 298

Frame 366

Figure 9. Selected frames from the experiment on real-time facial cartoon animation. Each of the images shows the tracked frame with 5 key facial
landmarks(red points) for pose estimation(left) and the corresponding cartoon avatar(right)

190
ACKNOWLEDGMENT
We would like to thank the anonymous reviewers for
their valuable comments. This work was supported by the
National Natural Science Foundation of China (No.
61303093, 61402278, U1304616), the Innovation Program
of the Science and Technology Commission of Shanghai
Municipality (No. 13511505002, 14511108402,
14511108200), the Natural Science Foundation of Shanghai
(No. 14ZR1415800), and the Innovation Program of
Shanghai Municipal Education Commission (No. 14YZ023).

REFERENCES

[1] Cao, Chen, Qiming Hou, and Kun Zhou. "Displaced dynamic
expression regression for real-time facial tracking and
animation." ACM Transactions on Graphics (TOG). 2014,
33(4): 43.
[2] Weise T, Bouaziz S, Li H, et al. Realtime performance-based
facial animation ACM Transactions on Graphics (TOG).
ACM, 2011, 30(4): 77.
[3] Cootes T F, Taylor C J, Cooper D H, Active shape models-
their training and application, Computer vision and image
understanding, 1995, 61(1): 38-59.
[4] Cootes T F, Edwards G J, Taylor C J. Active appearance
models, Springer Berlin Heidelberg, 1998: 484-498.
[5] Dornaika Fadi, and Jrgen Ahlberg. "Fast and reliable active
appearance model search for 3-D face tracking." Systems,
Man, and Cybernetics, Part B: Cybernetics, IEEE
Transactions On 34.4 (2004): 1838-1853.
[6] Zalewski L, Gong S. 2d statistical models of facial
expressions for realistic 3d avatar animation, Computer
Vision and Pattern Recognition, 2005. CVPR 2005. IEEE
Computer Society Conference on. IEEE, 2005, 2: 217-222.
[7] Dornaika Fadi, and Jrgen Ahlberg. "Efficient active
appearance model for real-time head and facial feature
tracking." Analysis and Modeling of Faces and Gestures,
2003. AMFG 2003. IEEE International Workshop on. IEEE,
2003.
[8] Cristinacce David, and Timothy F. Cootes. "Feature Detection
and Tracking with Constrained Local Models." BMVC. Vol.
Figure 10. Failures and errors of the facial tracking
1. No. 2. 2006.
[9] Saragih J M, Lucey S, Cohn J F. Real-time avatar animation
from a single image, Automatic Face & Gesture Recognition
VI. CONCLUSION AND DISCUSSION and Workshops (FG 2011), 2011 IEEE International
In this paper, we have introduced a real-time 2.5D facial Conference on. IEEE, 2011: 117-124.
cartoon animation system, which can run under 27fps. [10] Saragih J M, Lucey S, Cohn J F. Deformable model. fitting by
Different from the existing 3D Avatar system, any local parts regularized landmark mean-shift. International Journal of
Computer Vision. 2011, 91(2): 200-215.
can be replaced by another 2D cartoon drawing or even a
[11] Blanz V, Vetter T. A morphable model for the synthesis of
vector art. Our approach is more simple and less time- 3D faces, Proceedings of the 26th annual conference on
consuming than modeling a 3D model with non- Computer graphics and interactive techniques. ACM
photorealistic rendering. The result shows to be interesting Press/Addison-Wesley Publishing Co., 1999: 187-194.
and aesthetics. The system has great potential values in game [12] Cai Q, Gallup D, Zhang C, "3d deformable face tracking with
and social network applications. a commodity depth camera." Computer Vision-ECCV 2010.
However, there are some problems need to improve. The Springer Berlin Heidelberg, 2010. 229-242.
ASM landmarks jitter obviously in the case of complex [13] Murphy-Chutorian E, Trivedi M M, Head pose estimation in
background and fast move face. In order to reduce the error, computer vision: A survey, Pattern Analysis and Machine
Intelligence, IEEE Transactions on, 2009, 31(4): 607-626.
we will try some constraint models such as CLM in our
[14] Cohen I, Sebe N, Garg A, Chen L. S, Huang T S, Facial
future work. expression recognition from video sequences: temporal and
static modeling, Computer Vision and image understanding,
2003, 91(1), 160-187.

191
[15] Kumar N, Berg A C, Belhumeur P N, Attribute and simile
classifiers for face verification, Computer Vision, 2009 IEEE
12th International Conference on. IEEE, 2009: 365-372.
[16] Li Y, Yu F, Xu Y Q, Chang E, and Shum H Y, Speech-
driven cartoon animation with emotions, Proceedings of the
ninth ACM international conference on Multimedia. ACM,
2001: 365-371.
[17] Noh Jun-yong, and Ulrich Neumann. "Expression cloning."
Proceedings of the 28th annual conference on Computer
graphics and interactive techniques. ACM, 2001.
[18] Chuang, Erika, and Chris Bregler. "Performance driven facial
animation using blendshape interpolation." Computer Science
Technical Report, Stanford University, 2002, 2(2): 3.
[19] Xiao J, Baker S, Matthews I, et al. Real-time combined 2D+
3D active appearance models, Computer Vision and Pattern
Recognition(CVPR) (2). 2004: 535-542.
[20] Rivers A, Igarashi T, Durand F. 2.5 D cartoon models, Proc.
ACM Transactions on Graphics (TOG). ACM, 2010, 29(4):
59.

192

Potrebbero piacerti anche