Displays: Modeling of Facial Expression and Emotion For Human Communication System

DISPLAYS
ELSEVIER
Displays 17 (1996) 15-25
Modeling of facial expression and emotion for

human communication system
Shigeo Morishima*
Facul(y of Engineering, Seikei University, 3-3-1 Kichijoji-Kitamachi, Musashino, Tokyo 180, Japan
Received 7 January 1995; revised7 September 1995
Abstract
The goal of this research is to realize a face-to-face communication environment with machine by giving a facial expression to
computer system. In this paper, modeling methods of facial expression including 3D face model, expression model and emotion model
are presented. Facial expression is parameterized with Facial Action Coding System (FACS) which is translated to the grid's motion
of face model which is constructed from the 3D range sensor data. An emotion condition is described compactly by the point in a 3D
space generated by a 5-layered neural network and its evaluation result shows the high performance of this model.
Keywords." Face model; Range data; Emotion model; Neural network; Nonverbal communication;FACS
1. Introduction
Facial expressions are essential for communications

between humans in addition to voice. They include
several factors capable of expressing non-verbal
information as voluntary or spontaneous activities.
Therefore, adding facial expression recognition and
synthesis to the communications environment will
certainly improve the communications between humans
and machines. Emotions make up one of the most
important groups describing facial expressions.
Our goal is to develop a very natural human-machine
communication system by giving a face to each computer
terminal and to the communication system itself. Of
course, among other things this system will have to be
able to recognize the facial conditions of the human it is
interacting with, especially the emotions appearing on
the face, and to synthesize a reasonable and suitable
facial image sequence based on the conditions. With
this system, therefore, an operator can communicate
with a machine naturally, like a human talking with
another human face-to-face.
This paper presents an approach that enables the
modeling of human emotions appearing on faces. A
E-mail: shigeo@ee.seikei.ac.jp
0141-9382/96/$15.00 1996 Elsevier Science B.V. All rights reserved
SSDI 0141-9382(95)01008-4
system with this approach helps to analyze, synthesize

and code face images at the emotion level.
First, a 3D facial model is described to parameterize
facial expressions by using the Facial Action Coding
System (FACS). Each Action Unit of FACS is translated
to movements of grid points on this facial model. By
projecting the original texture of a human's face onto
the surface of a wire frame model, an image with facial
expressions can be synthesized with sufficient reality.
This synthesized face can also be made to speak Japanese
and English sentences with natural impressions; lip
motion control is achieved by the sentence or voice [1].
Two types of 3D models exist: the generic model and
high definition model. The former only has a face model
but the model has explicit features like a nose, a mouth,
eyes and eyebrows; it is generated from a mannequin's
head. The latter is composed from range data obtained
through a Cyberware 3D range scanner. It gives a strict
structure for each person and adds texture around the
head. The motion of this model can be controlled hierarchically using a generic model adjusted to this high
definition model [2].
Recent years have seen a lot of research on facial
image recognition [3-5] and synthesis [6]. However, trials
on the modeling of emotion conditions have only been
reported by a few psychologists [7-10], and this was a
few years ago. As the results were not applicable to the
16
S. Morishima/Displays 17 (1996) 15-25
Digitizer unit
Fig. 4. Polygon mesh.
Fig. 1. 3D range sensor.
. o
........
" P(i,j+l)
--- . 0
"
,,' P(i, j)
oO
"
...... :.O"--
," P(i, j-l)
,"
o"
........ . 0 : .......
P(i-l,j+l)""
::
,'"
:.0--::
Fig. 5. Polygon generation from range data.
Fig. 2. Range data.
criteria of our application, our emotion model was

originally constructed based on a multi-layered neural
network. The model features identity mappings, nonlinear mappings and can perform generalizations. The
dimensions of the expression space can be made 3D to
represent a variety of face categories, by the interpolation and linear combination of basic expressions.
This system can also analyze and synthesize expressions
simultaneously. Evaluation results of this system are
given toward the end of the paper.
2. Modefing of face
2 . 1 . 3 D digitizer
Fig. 3. Captured texture.
To measure the 3D features and details of a personal

face, a 3D range sensor, Cyberware 4 0 2 0 / P S 3D Rapid
Digitizer, was introduced. The system has a Laser Range
Finder and a 3CCD color camera to capture both the
range data and surface texture of an object. The sensor is
17
Shigeo Morishima/Displays17 (1996) 15-25

Surface normal
shown in Fig. 1; the digitizer unit can rotate the object

once. The resolution of the range data and texture is 512
points for both the vertical and rotational directions.
Range data and surface texture captured by the digitizer
are shown in Fig. 2 and Fig. 3, respectively. Each point of
the surface texture is expressed with 24-bit full color
images.
2.2. Mesh of range data

Captured range data is expressed in cylindrical coordinates. Range data is defined as the distance R(i,j)
between each point P(i,j) and the center axis of the
cylinder. Integers i and j are the vertical and horizontal
label of point, so the horizontal pitch is 27r/512. Not all
the points are obtained due to diffusion of the Laser
beam especially where there is hair or occlusion like
inside the ear or mouth. To fill these points, interpolation
and extrapolation are applied. A polygon mesh like that
shown in Fig. 4 is easily created by the rule shown in Fig.
5, but the resolution of points is reduced to a quarter of
the original for both the vertical and rotational directions. There are about 28 000 polygons in this model.
Fig. 6. Featurepoint.
O0
A
* Q(o, h)
o
Fig. 7. Mappingfeaturepoints to 2D plane.
2.3. Feature points selection

The speed of texture mapping in a graphics workstation depends on the number of meshes. Therefore, realtime natural image synthesis requires a reduction in the
number of points. A simple reduction, however, is insufficient to hold the original complex features of an object.
A face surface is especially difficult as it changes dynamically from part to part, so the polygon reduction rules
become complex.
First, the surface normal of each mesh is quantized
into N levels. If all of the surface normals around
P(i,j) have the same index, then that point is rejected.
If not, that point is selected as the feature point indicated
in Fig. 6. P(i,j) has texturef(i,j), so the points on an
edge that is detected and enhanced by a Laplacian filter
are selected with higher priority. N, however, is not the
same in all regions and larger values of N are allocated to
delicate parts of the face. Delicate parts are categorized
into four regions by observation of the captured data on
basic facial expressions. Of course, parts besides the face
like the hair and neck are allocated with smaller quantization values of N.
2.4. Reconstruction of mesh

The selected feature points are located at random and
it is not easy to determine meshes with the points in 3D
space. At first, therefore, all of the selected points are
projected onto a cylinder having a constant diameter
and whose center axis is equal to the vertical axis of
the cylindrical coordinates (in Fig. 7). On this 2D
Voronoi region
Delaunay net
Mesh model
Fig. 8. Constructionof meshfrom featurepoints.

plane, the Voronoi region is determined for each of the
points and each linkage is determined by the Delaunay
net. In Fig. 8, each black point is a selected feature point,
each cell is a Voronoi region and the dotted line is the
Delaunay net. Each polygon is composed of these three
linkages. Lastly, feature points are inversely mapped to
the 3D space, and a 3D wireframe model is constructed.
For example, when the number of quantization steps
of the face region is seven and that of the other regions is
five, the number of feature points becomes 2070. A result
of the reconstruction of a mesh is shown in Fig. 9. The
number of polygons is 4269.
2.5. Modeling of mouth

The range data does not include data from the mouth
when the target face has a closed mouth. Instead, the lip
part is determined manually and divided into an upper
lip and lower lip.
18
S. Morishima/Displays 17 (1996) 15 25
In most cases, however, the inside wall of the mouth

(1700 points and 3000 meshes) is appended and the texture inside the mouth is mapped onto the model so that
no blank spaces can be seen inside the mouth. This texture is captured by a camera and the density is controlled
according to the openness of the mouth. Fig. 10 shows a
wireframe model of the wall inside a mouth.
Fig. 11 shows a teeth model constructed by measuring
artificial teeth (used by dentists) and rendered by Gouraud shading. Actually, this model is used as the standard
teeth model independent of target person.
3. Modeling of expression
3.1. Generic model
Fig. 9. Reconstructed mesh model.
A 3-D generic model, as shown in Fig. 12, which can

approximately represent a human face is used to synthesize the facial expressions basically. This model is composed of about 600 polygonal elements and it was
constructed by measuring a mannequin's head. The generic model is 3D affine-transformed to harmonize several
of its feature point positions with those of a given 2D full
face image captured by a 3D digitizer or 2D camera. This
point adjustment is done by an interactive procedure to
generate a personal model with the generic model. To
synthesize expressions, the rules for generic model modification are used to hierarchically control and 3D range
data employed for mesh construction.
Fig. 10. Model around mouth.
3.2. Expression control with FACS
Fig. 11. Teeth model.
To describe a human's facial expressions, the Facial

Action Coding System (FACS) proposed by Ekman [11]
is introduced. FACS divides the facial muscle action
movement appearing on a face into 44 standard units.
Each standard unit is an Action Unit (AU). It is possible
to describe any face by combining these AUs. Of the 44
Action Units, the 17 units shown in Table 1 were selected
because they strongly influence emotional expressions.
The Action Unit parameters were converted to the
Table 1
Action units selection
Generic Model
Personal Model
Fig. 12. Generic model,
AU No
FACS name
AU No
FACS name
AU1
AU2
AU4
AU5
AU6
AU7
AU9
AU10
AU12
Inner brow raiser

Outer brow raiser
Brow lower
Upper lid raiser
Cheek raiser
Lid tightener
Nose wrinkler
Upper lid raiser
Lid comer puller
AU14
AU15
AU 16
AU 17
AU20
AU23
AU25
AU26
Dimpler
Lip comer depressor
Lower lip depressor
Chin raiser
Lip stretcher
Lip tightener
Lip part
Jaw drops
Shigeo Morishima/Displays 17 (1996) 15-25
19
movement of each feature point on the generic model

and then the personal model was deformed. This lets
the AU parameters have their original numeric values
in the system normalized to [0, 1].
To reduce the calculation task, no physical models
about muscles [12,13] are introduced in the system. The
positions of the feature points on the generic model are
only controlled by the modification rules, and the other
point locations on the range data model are determined
by interpolation of the feature points.
3.3. Synthesized expression
Fig. 13. Synthesized image (4269 meshes).
Fig. 13 and Fig. 14 show images synthesized using

4269 meshes and the original 27 648 mesh model, respectively. Few differences can be detected between these two
images. Fig. 15 shows a modification of the mouth and
expression; the model inside the mouth contributes to
improving the impression and reality of the image.
4. Modeling of emotion
4.1. Approach to modeling
Fig. 14. Synthesized image (27 648 meshes).
When the same data is given to both the input and

output layer of a neural network (hereinafter referred
to as Identity Mapping training), various internal structures in the given data can be acquired in the network
[14]. In the case of a 5-layered network which has enough
units in the hidden (2nd or 4th) layer, even if there is only
one unit in the middle layer (3rd), any training data in the
input layer is capable of reappearing in the output layer
[151.
First, Identity Mapping training is introduced using
the data patterns distributed in a semicircle feature. In
this experiment, the 5-layered neural network shown in
Fig. 16 which has 2,8,1,3,2, units in the individual layers
is used. To achieve convergence to a more global minimum, the training data is gradually given to the network.
This process is indicated in Fig. 17 with the order of (a),
(b), (c), (d). Only three points are given in (a) first,
followed by five points in (b), seven points in (c) and
all points in (d). The locus line in Fig. 17 is drawn by
y
layer
Fig. 15. Synthesized image with expression.
'~C)/
layer
Fig. 16. Network for 2D circle data.
layer
20
data could be regenerated by linking the middle layer

and output layer. Consequently, one can expect a facial
expression space to be constructed in the middle layer by
applying this model exactly to the analysis and synthesis
of the facial expressions. So, a 5-layered feed-forward
network which is superior in non-linear performance is
good for modeling the features of facial expressions. This
network structure is shown in Fig. 18.
This network has 17 units in the input layer and output
layer corresponding to the number of Action Units. The
hidden layer has 19 units and the middle layer has three.
Unit functions of the 1st, 3rd and 5th layers are linear
functions to maintain a wide dynamic range of the parameter space. Those of the 2nd and 4th layers are sigmoid
functions for non-linear mapping. Regarding the facial
expression space constructed in the middle layer as an
Emotion space, a system is simultaneously constructed
to recognize and synthesize human faces.
: Training points
(b) Stage 2
(a) Stage 1
\/
4.3. Dimensions of space
(c) Stage 3
(d) Stage 4
Fig. 17. Reproduction of circle data with identity mapping.
Analysis
Synthesis
I I
Input-Layer
(EmotionSpace)
e-
Output-Layer
Fig. 18. Networkfor emotionmodel.

adding a 1D continuous value to the 3rd layer of the
network and getting 2D coordinates from the output
layer. The result is a strong Identity Mapping performance and ideal generalization performance enabling
the circle to be drawn. All of the points on the circle are
mapped onto the same circle given by the network.
4.2. Network structure
From the experiment mentioned above, it was considered that by linking the input layer and middle layer
in the Identity Mapping network, the features of the
training data could be extracted. Following this, the
Transitions of emotional changes should be described

quantitatively and continuously in the space. Any difference in a route showing a transition of an emotion has to
be described for a person without restrictions. It is also
necessary to describe the personal scenario of an emotion
transition by observing the path of progress in the space.
In the case of a 2D emotion space [16], it may well
cross another emotion locus at the time of a transition
between two emotions. That is to say, a 2D space can not
describe an emotional change without crossing another
emotion factor; so at the very least, 3D is necessary. In
the field of psychology, the dimension of emotion has
been discussed and 2D and 3D factors have been found
[7,17]. The emotion model in this system is constructed
by setting the number of units in the middle layer to
three.
Which dimensional unit to use in order to best
represent all kinds of human faces is a problem that
needs to be solved if true face image compression is to
be achieved. Currently, the targets of facial expressions
are limited to only six basic expressions which are
independent of the human race.
4.4. Training to convergence
Six basic expressions described by Action Unit parameters are used as the training data. By changing the
intensity of the degree of emotion every 25% up to
100%, the amount of learning data is equal to 25 expressions including neutrals (four for each expression). The
intensity of the degree of emotion is defined at the parameter level, not at the impression level. Table 2 shows the
combination of AU parameters used to describe the six
basic expressions. The numbers in parentheses indicate
the intensities of the individual AU parameters. These
21
2
(slantedness
)
Table 2
Combination of AU for basic emotions
Basic
emotion
Combination of AU parameters
Surprise
Fear
Disgust
Anger
Happiness
Sadness
AU 1-(40), 2-(3011,5-(60), 15-(20), 16-(25), 20-(10), 26-(60)

AUI-(50), 2-(10), 4-(80), 5-(60), 15-(30), 20-(10), 26(30)
AU2-(60), 4-(40), 9-(20), 15-(60), 17-(30)
AU2-(30), 4-(601, 7-(50), 9-(20), 10-(10), 20-(15), 26-(30)
AU 1-(65), 6-(70}, 12-(10), 14-(10)
AU 1-(40), 4-(50), 15-(40), 23-(20)
\
Fear
Happines~]
Disgust
/
Y
I
Sadne
Angel
~eutral
oxlness
Fig. 21. Basic emotions located in 2D space.
The basic emotion locations in the middle layer are

extended and distributed in all directions like the spokes
of a wheel in the 3-D space. Therefore, there are many
routes of emotional changes between basic emotions,
without loci crossing, and more delicate expressions
can be synthesized by interpolations or combining the
six basic expressions with the generalization effect of
the network. A specific expression is described only by
a 3D coordinate value in the emotion space.
Fig. 19. 3D emotion space.
5. Psychological emotion space [17,18]

In psychological research, some experiments have
tried to find out the locus of basic expressions in the
space. Here, a 3D emotion space constructed in the middle layer is compared with two facial expression spaces
that have been proposed in the field of psychology. Both
of them were constructed by using the schematic face of a
human; one in 2D and the other in 3D.
5.1.2D space
Fig. 20. Schematic face for analysis.
values were determined using our Action Unit editor on

a trial and error basis and were subjectively evaluated by
some people.
The training process is done gradually as indicated
in the previous section until the 100% mark of the
degree of emotion is appended and convergence is
completed by the backpropagation algorithm. As a
result, the 3-D emotion space shown in Fig. 19 is constructed in the middle layer. By giving the intensity of
the degree of emotion from 0 to 100% continuously to
the input layer and regarding the middle layer's 3D outputs as (x, y, z), the locus of each basic expression can
be plotted in the 3-D space.
In the first experiment, the human schematic faces

shown in Fig. 20 were used. Firstly, subjects were
asked to deform a schematic face until they found it to
be expressing one of the six basic expressions. Secondly,
from each face deformed by the subjects, the displacements of facial feature points from the neutral face were
measured and analyzed by factor analyses. Results produced two major factors. By putting these variables in
rectangular coordinates, the basic expressions' area was
specified in 2D space as shown in Fig. 21.
The two factors obtained from the experiment were
named 'curvedness/openness' and 'slantedness'. By
doubling this psychological space with our 3D emotion
space projected onto a specific 2D space, an interesting
correlation was found. Namely, Anger and Happiness,
22
Surprise
7O
Fear
c AN
Surprise
Sadness
Sadness
-m
. . . . . . . . . . . . . . . . . . . . . .
6e
. . . . . . . . . . . . . . . . . . . . . .
4e
. . . . . . . . . . . . . . . . . . . . .
~e
. . . . . . . . . . . . . . . . . . . . .
2o
se
70 . . . . . . . . . . . . . . . . . . . . .
7o
. . . . . . . . . . . . . . . . . . . . .
~.
Anger~ ~
Sadness I
,o.
7e
. . . . . . . . . . . . . . . . . . . . .
--- . . . . . . . . . . .
.......
1'
o
Sun prise
4e . . . . . . . . . . . . . . . . . . . . .
~e . . . . . . . . . . . . . . . . . . . . .
z
/.e . . . . . . . . . . . . . . . . . . . .
2e
IO .
IO
o
1
;-210
i
4
= .....
ResponseforAnger
0, oAN,
CAN 1
Sadness
-1.95
Surprise
Re
70
70
se . . . . . . . . . . . . . . . . . . . . . . .
se
4e
. . . . . . . . . . . . . . . . . . . . . . .
3e
Fear
OT ~
I~
1o
IO
la
le
Sm ~se
. . . . . . . . . . . . . . . . . . . . .
70
. . . . . . . . . . . . . . . . . . . . .
~e
3e
bHappiness
//
Sadness
?o
. . . . . . . . . . . . . . . . .
7O
so
. . . . . . . . . . . .
so
ao
. . . . . . . . . . .
4o
. . . . . . . . . .
3e
2o
--
le
:le
le
Responsefor Happiness
Surprise
Responsefor Fear
ral
Anger
Surprise
1o . . . . . . . . . . . . . . . . . . . . . .
Disgust
Sadness
. . . . . . . . . . . . . . . . . . . . . . .
*so
70 . . . . . . . . . . . . . . . . . . . . . . .
Fig. 22. Location of basic emotions (discriminant analysis).
Responsefor Disgust
. . . . .
mSadness
Surpri~m
l-
j-
2e
Io
~
Y
Fig. 23. Locations of basic emotions (3D emotion space).
Responsefor Sadness
II
n
S
,
6
m
7
u
s
n
~
It
Responsefor Surprise
Fig. 24. Impression in emotion space between sadness and surprise.
Surprise and Sadness were in opposite positions in both

spaces. This shows that the axes for 'curvedness/openness'
and 'slantedness' do exist in our 3D emotion space, too.
5.2. 3D space
In this experiment subjects forming another group
were asked to classify the schematic expression faces
which had been produced in the former experiment,
i.e., into basic categories of emotions. A canonical discriminant analysis on the relationship between the displacements of the feature points of each expression face
and the distributions of categorical judgments, revealed
three major canonical variables. By putting these variables in rectangular coordinates, the basic expressions
could be placed in a 3D space as shown in Fig. 22. In
comparison with the basic emotions' locations obtained
in the middle layer of the network in Fig. 23, many points
were the same as those in the previous 2D experiment.
6. Evaluation of emotion space
6.1. Continuity of impression

The experiments mentioned in the previous chapter
were mainly concerned with evaluating the mapping

from expressions to the emotion space. Our 3D emotion
space allows both mapping and inverse mapping. This
chapter therefore describes an evaluation of inverse
mapping.
All expressions described by AU parameters are transformed to 3D coordinates by mapping from the input
layer to the middle layer of the network. On the other
hand, all points in the emotion space have a specific
facial expression by inverse mapping from the middle
layer to the output layer. To evaluate this inverse mapping performance, an expression image determined by a
3D coordinate system was synthesized and printed as a
picture. Seventy-six subjects, i.e., psychology students,
categorized the obtained image into eight groups, that
is six basic expressions, neutral and others.
One example of the experimental results is shown in
Fig. 24. The line between Sadness and Surprise is divided
into 11 points separated with equal intervals in the 3D
emotion space. Expression images were synthesized with
these 11 (x, y, z) coordinates and evaluated. In Fig. 24,
the horizontal axis from left to right means the change of
emotion from Sadness to Surprise and the vertical axis is
the number of positive answers. The six graphs correspond to the six basic expressions.
23
Fig. 26. Originalimage.

Sadness ~
x "~''-
Haooiness
Fig. 25. New locus from subjectivetest, (a) locus drawn in 3D space;
(b) locus mapped onto 2D space.
According to the change from left to right on the horizontal axis in Fig. 24, one can see that the support for
Sadness is gradually reduced, the Disgust of a few
appears for a moment, and then the Happiness of
many appears; with time, the support for Surprise gradually increases. This result confirms the interpolation
effect of the 5-layered network. Therefore, all of the
expressions in this emotion space are continuously
located at an impression level. The appearance of
Happiness is confirmed by the fact that the line between
Surprise and Sadness exceeds the point of Happiness in
the space indicated in Fig. 23; the line is far from the
point of Disgust, so the influence of Disgust is weak.
Similar effects appear fbr other pairs of emotions.
Fig. 27. Synthesizedanger.
6.2. Peak emotions
By the previous evaluation test, the basic expressions'

locus can be redrawn toward the peak points of positive
answers from the neutral position. The new locus from
the subjective test is indicated in Fig. 25 and synthesized
images from new peak points are shown in Figs. 27 to 32.
The original image is given in Fig. 26.
Fig. 28. Synthesizedfear.
24
Fig. 29. Synthesizedsurprise.
Fig. 30. Synthesizeddisgust.
Fig. 32. Synthesizedsadness.
introduced to synthesize a real face image with reduced

calculation cost. For expression modeling, a generic
model is introduced to provide hierarchical control
enabling FACS-based parameters to produce real
expressions. Lastly, an emotion condition model based
on a neural network is introduced. Many intermediate
faces can be expressed and synthesized by the 3D
emotion space.
From evaluation results of the emotion space, the
interpolation effect and continuous features could be
confirmed. Therefore, the emotion space satisfies some
of the conditions for the criteria of recognition, synthesis
and compression of face images. As a result, the 3D
emotion space will enable a very natural communications environment between humans and machines in a
few years in this ongoing research.
Based on the 3D emotion space, the description,
synthesis and recognition of face expressions are
possible. Actually, this will basically depend on the
Action Unit parameters. Therefore, AU recognition
from real face images is necessary to achieve an interface or communication system. In the case of an interface
in particular, mapping rules between the users' faces and
their system's face are additionally necessary. These rules
can be expressed by transformations in the 3D emotion
space.
Acknowledgements
Fig. 31. Synthesizedhappiness.
7. Conclusion
In this paper, three kinds of modeling methods
are presented. For face modeling, a 3D model constructed by range data and a mouth model is
I will give a special thanks to Prof. Demetri Terzopoulos, University of Toronto and Mr. Shoichiro
Iwasawa, Seikei University for making a face model
and assisting in the experiment. I would also like to
thank Prof. Hiroshi Yamada, Kawamura College and
Mr. Fumio Kawakami, Seikei University for performing
the psychological evaluation of emotion space.
References
[1] S. Morishima and H. H~xashima, A media conversion from speech
to facial image for intelligent man-machine interface, IEEE
Journal on Selected Areas in Communication, 9(4) (1991).
[2] T. Sakagnchi, M. Ueno, S. Morishima and H. Harashima,
Analysis and synthesis of facial expression using high-definition
wire frame model, Proc. 2nd IEEE International Workshop on
Robot and Human Communication, 1993, 194-199.
[3] I. Essa, T. Darrell and A. Pentland, Tracking facial motion, Proc.
Workshop on Motion and Non-rigid and Articulated Objects,
1994, 36-42.
[4] H. Kobayashi and F. Hara, Analysis of the neural network
recognition characteristics of 6 basic facial expressions, 3rd
IEEE International Workshop on Robot and Human Communication, 1994, 222-227.
[5] M. Rosenblum, Y. Yacoob and L. Davis, Human emotion recognition from motion using a radial basis function network architecture, Proc. Workshop on Motion and Non-rigid and
Articulated Objects, 1994, 43-49.
[6] Y. Lee, D. Terzopoulos and K. Water, Constructing physics-based
facial models of individuals, Graphics Interface '93, 1993, 1-8.
[7] H. Schlosberg, Three Dimension of Emotion, The Psychological
Review, 61(2) (1954) 81-88.
[8] H. Schlosberg, A scale for judgment of facial expression, Journal
of Experimental Psychology, 29 (194 l) 497-510.
[9] C.A. Smith and P.C. Ellsworth, Pattern of cognitive appraisal in
25
emotion, Journal of Personality and Social Psychology, 48(4)

(1985) 813-838.
[10] J.A. Russell and M. Bullock, Multi-dimensional scaling of
emotional facial expressions: similarity from preschoolers to
adults, Journal of Personality and Social Psychology, 48(5)
(1985) 1290-1298.
[11] P. Ekman and W. Friesen, Facial action coding system, Consulting Psychologists Press, 1977.
[12] K. Waters, A muscle model of animating three dimensional facial
expression, Computer Graphics, 22(4) (1987) 17-24.
[13] D. Terzopoulos and K. Waters, Physically-based facial modeling,
analysis and animation, Journal of Visualization and Computer
Animation, 1 (1990) 73-80.
[14] Y. Katayama and K. Ohyama, Some characteristics of self
organizing back propagation neural networks, Spring National
Convention Record, IEICE-J, SD-I-14, 199, 7-309.
[15] N. Ueki, Y. Katayama and S. Morishima, A topological feature of
multi-layer neural network, Spring National Convention Record,
IEICE-J, D-21, 1991, 6-21.
[16] S. Morishima and H. Harahsima, Emotion space for analysis and
synthesis of facial expression, IEEE International Workshop on
Robot and Human Communication, 1993, 188-193.
[17] H. Yamada, Dimensions of visual information for categorizing
facial expressions of emotion, Japanese Psychological Research,
35(4) (1993).
[18] H. Yamada, Visual information for categorizing facial expression
of emotions, Applied Cognitive Psychology, 7 (1993) 257-270.

Displays: Modeling of Facial Expression and Emotion For Human Communication System

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Displays: Modeling of Facial Expression and Emotion For Human Communication System

Caricato da

Copyright:

Formati disponibili

DISPLAYS

Displays 17 (1996) 15-25

Modeling of facial expression and emotion for

Received 7 January 1995; revised7 September 1995

Facial expressions are essential for communications

system with this approach helps to analyze, synthesize

S. Morishima/Displays 17 (1996) 15-25

Fig. 4. Polygon mesh.

Fig. 1. 3D range sensor.

," P(i, j-l)

Fig. 5. Polygon generation from range data.

Fig. 2. Range data.

criteria of our application, our emotion model was

Fig. 3. Captured texture.

To measure the 3D features and details of a personal

Shigeo Morishima/Displays17 (1996) 15-25

shown in Fig. 1; the digitizer unit can rotate the object

2.2. Mesh of range data

Fig. 7. Mappingfeaturepoints to 2D plane.

2.3. Feature points selection

2.4. Reconstruction of mesh

Fig. 8. Constructionof meshfrom featurepoints.

2.5. Modeling of mouth

In most cases, however, the inside wall of the mouth

A 3-D generic model, as shown in Fig. 12, which can

Fig. 10. Model around mouth.

3.2. Expression control with FACS

Fig. 11. Teeth model.

To describe a human's facial expressions, the Facial

Fig. 12. Generic model,

Inner brow raiser

Shigeo Morishima/Displays 17 (1996) 15-25

movement of each feature point on the generic model

3.3. Synthesized expression

Fig. 13. Synthesized image (4269 meshes).

Fig. 13 and Fig. 14 show images synthesized using

4.1. Approach to modeling

Fig. 14. Synthesized image (27 648 meshes).

When the same data is given to both the input and

Fig. 15. Synthesized image with expression.

Fig. 16. Network for 2D circle data.

S. Morishima/Displays 17 (1996) 15-25

data could be regenerated by linking the middle layer

4.3. Dimensions of space

Fig. 17. Reproduction of circle data with identity mapping.

Fig. 18. Networkfor emotionmodel.

Transitions of emotional changes should be described

Shigeo Morishima/Displays 17 (1996) 15-25

AU 1-(40), 2-(3011,5-(60), 15-(20), 16-(25), 20-(10), 26-(60)

Fig. 21. Basic emotions located in 2D space.

The basic emotion locations in the middle layer are

Fig. 19. 3D emotion space.

5. Psychological emotion space [17,18]

values were determined using our Action Unit editor on

In the first experiment, the human schematic faces

S. Morishima/Displays 17 (1996) 15-25

Fig. 22. Location of basic emotions (discriminant analysis).

Fig. 24. Impression in emotion space between sadness and surprise.

Surprise and Sadness were in opposite positions in both

6. Evaluation of emotion space

6.1. Continuity of impression

were mainly concerned with evaluating the mapping

Shigeo Morishima/Displays 17 (1996) 15-25