Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Real-time 2.5D Facial Cartoon Animation based on Pose and Expression Estimation
AbstractIn this paper, a real-time facial animation system is model, which simulates 3D rotations and generate natural
presented, that captures live facial expressions and poses from cartoon animations in actually 2D plane, has become a new
user, and uses them to animate a synthetic 2.5D cartoon branch in avatar animation.
character(Avatar) in real-time. Compared to existing 3D In this paper, we propose an automatic real-time 2.5D
Avatar systems, the 2.5D cartoon models are easier to create facial cartoon animation system, which can work with
than full 3D model, and the cartoon-style drawings are more ordinary PC video camera. The overview of the proposed
popular among young people. Our system requires only an system is illustrated in Fig. 1. At the initialization step, one
ordinary PC camera, and the initialization is fully automatic. of the video frames captured by camera will be sent to the
We established a mapping from real-time facial motion to the
cartoon generator, which was developed in our previous
atomic motion parameters of the cartoon model by pose and
work. A static personalized 2D cartoon portrait will be
expression estimation. The animation is driven by these
parameters in the way of blendshape interpolation.
generated. Then a 2.5D cartoon model will be registered with
these cartoon-drawings as the textures. At the run-time,
Keywords-cartoon; facial animation; avatar; blendshape users head pose and facial expression will be estimated in
real-time. The result of estimation will transfer to the 2.5D
I. INTRODUCTION cartoon model in the form of motion parameters, which can
The avatar animation, which represents a person in animate the model with blendshape interpolation.
virtual world, has been an active research area of computer II. RELATED WORK
graphics in recent years. It is of great value both in research
and industry areas. In general, real-time facial animation system contains
Unfortunately, the face is very complex, and facial two parts, face tracking and expression transfer. Face
expressions are individual independent. It is a very tracking captures the motion of human face, and then transfer
challenging job to model face by 3D models in high the 3D information and expression to an avatar.
accuracy. However, users usually care more about interesting The research on face tracking has a long history. We can
and aesthetics than model accuracy in consuming-level divide face tracking methods into 3D shape tracking methods
applications. With the development of Anime and non- and 2D shape tracking method. The 3D shape tracking
photorealistic rendering(NPR), cartoon-style drawings and methods learn a 3D shape regressor to infer the 3D positions
animations are widely popular. Specifically, 2.5D cartoon of facial landmarks. The-state-of-art methods are Displaced
Animation
Pose
Estimation
Facial Tracking
Expression
Estimation
186
deformation on cartoon drawings is more natural than that on B. Pose Estimation
real faces, which makes it possible to simulate 3D Pose estimation is the process to get 3D rotations from
appearance by deformations on 2D cartoons. Fig. 4 shows 2D information, which is required for 2.5D animation. The
some examples of 3D-look cartoon with the same 2D cartoon human head is limited to 3 degrees of freedom (DOF) in
drawings that resulted by free-form-deformations(FFD) on pose, which can be characterized by pitch, roll, and yaw
these drawings. angles as pictured in Fig. 5.
187
yeye _ l yeye _ r Where Emouth is the local expression at mouth, Eeye_l and
roll = arctan( ) Eeye_r are the local expressions at left and right eyes.
xeye _ l xeye _ r A smile is the most common facial expression that occurs
(1) in peoples daily life. We trained a regression model for
smile with SVM(Support Vector Machine). We can predict a
Where (xeye_l, yeye_l) is the position of the left eye, and continuous parameter Paramsmile ranges from 0 to 100 which
(xeye_r, yeye_r) is the position of the right eye. Then we represents the expression from normal to laugh by the
defines two ratios Rp and Ry as: regression model.
D. Cartoon Animation Generation
L We predefined several blendshapes for each cartoon
Rp = m (2)
Lf organ in the 2.5D cartoon model. For any specific shape, we
can calculate the shape by blending several key shapes. The
formulaic description is:
Ln
Ry = (3)
N
Le
B = B0 + wi Bi (5)
i =1
Because face is axisymmetric, pitch and yaw angels have
strong relationship with the ratio Rp and Ry. We can acquire
Where B0 is the base blendshape. Wi is the weight for
the relationship by training a lot of pose labeled pictures.
ith delta blendshape. Briefly speaking, the 2.5D cartoon
C. Expression Estimation model can be driven by a set of parameters. We can
In general, we can estimate ones expression by his or her calculate the location parameters (pitch, yaw, roll) and
mouth and eyes. It is believed that mouth and eyes are the expression parameters (Paramsmile) by pose estimation and
most expression-rich areas of the face. If we can catch the expression estimation in B and C sections. The blendshape
motion of these areas and transfer the motion to our cartoon weight can be mapped from these atomic motion parameters.
model, the animation will be vivid. In order to simplify our V. EXPERIMENTAL RESULTS
model, we can definition any facial expressions as E:
We have implemented the described approach in C++
and developed the demo system with QT creator. The test
E = Emouth + Eeye _ l + Eeye _ r (4) system runs on a quad-core Intel Core i5 (2.5GHz)CPU, with
Figure 7. The interface of the real-time 2.5D facial cartoon animation system
188
an ordinary web camera producing 640480 video frames at
a maximum of 30 fps. The initialization step takes about 1-2
seconds. The interface of the implemented system is pictured
in Fig. 7.
The real-time video frames captured by PC camera is
displayed at the left-top corner. The generated cartoon
portrait is showed at the left-bottom corner. The 2.5D carton
avatar animation rendering by OpenGL displayed at the
middle area. If we are not satisfied with the automatic
cartoon portrait, we can also change some cartoon organs
manually by clicking the cartoon drawings at the right area.
The performance of the system shows in Table I. The
facial tracking takes the most of the time. The system can run
at the average speed of 27fps.
The system can apply equally for men and women. Fig.8
shows some experiments on that.
The selected frames of experiment on the real-time facial
cartoon animation are shown in Fig.9
Fig.10 shows some conditions that the facial tracking
step fails or with great error, which mainly caused by the
complex lighting, background, and fast move of the face.
TABLE I. AVERAGE PROCESSING TIME OF THE SYSTEM
Facial
Expression Animation
Landmarks Pose estimation
estimation rendering
Tracking
(a) (b)
Figure 8. Animation results on female and male users with normal and
smile expression (a) video frames captured by an ordinary web camera (b)
the real-time facial cartoon animation
189
Frame 100
Frame 148
Frame 236
Frame 298
Frame 366
Figure 9. Selected frames from the experiment on real-time facial cartoon animation. Each of the images shows the tracked frame with 5 key facial
landmarks(red points) for pose estimation(left) and the corresponding cartoon avatar(right)
190
ACKNOWLEDGMENT
We would like to thank the anonymous reviewers for
their valuable comments. This work was supported by the
National Natural Science Foundation of China (No.
61303093, 61402278, U1304616), the Innovation Program
of the Science and Technology Commission of Shanghai
Municipality (No. 13511505002, 14511108402,
14511108200), the Natural Science Foundation of Shanghai
(No. 14ZR1415800), and the Innovation Program of
Shanghai Municipal Education Commission (No. 14YZ023).
REFERENCES
[1] Cao, Chen, Qiming Hou, and Kun Zhou. "Displaced dynamic
expression regression for real-time facial tracking and
animation." ACM Transactions on Graphics (TOG). 2014,
33(4): 43.
[2] Weise T, Bouaziz S, Li H, et al. Realtime performance-based
facial animation ACM Transactions on Graphics (TOG).
ACM, 2011, 30(4): 77.
[3] Cootes T F, Taylor C J, Cooper D H, Active shape models-
their training and application, Computer vision and image
understanding, 1995, 61(1): 38-59.
[4] Cootes T F, Edwards G J, Taylor C J. Active appearance
models, Springer Berlin Heidelberg, 1998: 484-498.
[5] Dornaika Fadi, and Jrgen Ahlberg. "Fast and reliable active
appearance model search for 3-D face tracking." Systems,
Man, and Cybernetics, Part B: Cybernetics, IEEE
Transactions On 34.4 (2004): 1838-1853.
[6] Zalewski L, Gong S. 2d statistical models of facial
expressions for realistic 3d avatar animation, Computer
Vision and Pattern Recognition, 2005. CVPR 2005. IEEE
Computer Society Conference on. IEEE, 2005, 2: 217-222.
[7] Dornaika Fadi, and Jrgen Ahlberg. "Efficient active
appearance model for real-time head and facial feature
tracking." Analysis and Modeling of Faces and Gestures,
2003. AMFG 2003. IEEE International Workshop on. IEEE,
2003.
[8] Cristinacce David, and Timothy F. Cootes. "Feature Detection
and Tracking with Constrained Local Models." BMVC. Vol.
Figure 10. Failures and errors of the facial tracking
1. No. 2. 2006.
[9] Saragih J M, Lucey S, Cohn J F. Real-time avatar animation
from a single image, Automatic Face & Gesture Recognition
VI. CONCLUSION AND DISCUSSION and Workshops (FG 2011), 2011 IEEE International
In this paper, we have introduced a real-time 2.5D facial Conference on. IEEE, 2011: 117-124.
cartoon animation system, which can run under 27fps. [10] Saragih J M, Lucey S, Cohn J F. Deformable model. fitting by
Different from the existing 3D Avatar system, any local parts regularized landmark mean-shift. International Journal of
Computer Vision. 2011, 91(2): 200-215.
can be replaced by another 2D cartoon drawing or even a
[11] Blanz V, Vetter T. A morphable model for the synthesis of
vector art. Our approach is more simple and less time- 3D faces, Proceedings of the 26th annual conference on
consuming than modeling a 3D model with non- Computer graphics and interactive techniques. ACM
photorealistic rendering. The result shows to be interesting Press/Addison-Wesley Publishing Co., 1999: 187-194.
and aesthetics. The system has great potential values in game [12] Cai Q, Gallup D, Zhang C, "3d deformable face tracking with
and social network applications. a commodity depth camera." Computer Vision-ECCV 2010.
However, there are some problems need to improve. The Springer Berlin Heidelberg, 2010. 229-242.
ASM landmarks jitter obviously in the case of complex [13] Murphy-Chutorian E, Trivedi M M, Head pose estimation in
background and fast move face. In order to reduce the error, computer vision: A survey, Pattern Analysis and Machine
Intelligence, IEEE Transactions on, 2009, 31(4): 607-626.
we will try some constraint models such as CLM in our
[14] Cohen I, Sebe N, Garg A, Chen L. S, Huang T S, Facial
future work. expression recognition from video sequences: temporal and
static modeling, Computer Vision and image understanding,
2003, 91(1), 160-187.
191
[15] Kumar N, Berg A C, Belhumeur P N, Attribute and simile
classifiers for face verification, Computer Vision, 2009 IEEE
12th International Conference on. IEEE, 2009: 365-372.
[16] Li Y, Yu F, Xu Y Q, Chang E, and Shum H Y, Speech-
driven cartoon animation with emotions, Proceedings of the
ninth ACM international conference on Multimedia. ACM,
2001: 365-371.
[17] Noh Jun-yong, and Ulrich Neumann. "Expression cloning."
Proceedings of the 28th annual conference on Computer
graphics and interactive techniques. ACM, 2001.
[18] Chuang, Erika, and Chris Bregler. "Performance driven facial
animation using blendshape interpolation." Computer Science
Technical Report, Stanford University, 2002, 2(2): 3.
[19] Xiao J, Baker S, Matthews I, et al. Real-time combined 2D+
3D active appearance models, Computer Vision and Pattern
Recognition(CVPR) (2). 2004: 535-542.
[20] Rivers A, Igarashi T, Durand F. 2.5 D cartoon models, Proc.
ACM Transactions on Graphics (TOG). ACM, 2010, 29(4):
59.
192