NonLinear Analysis of Emotional Voice

INES 2013 IEEE 17th International Conference on Intelligent Engineering Systems June 19-21, 2013, Costa Rica
Automatic Analysis of Emotional Response based

on non-linear speech modeling oriented to
Alzheimer Disease diagnosis
K. Lopez-de-Ipina*, J.B. Alonso **, C.M. T ravieso **, H. Egiraun *, M. Ecay *** , A. Ezeiza*, N. Barroso *and P.
Martinez-Lage***
*University of the Basque Count ry, Donosti a, Spain
** University of Las Palmas de Gran Canaria, Las Palm as, Spain
*** CITA-Alzheim er Foundation, Donostia, Spain
e-m ail karmel e.ipina@ehu.es
AbstractAlzheimers disease (AD) is the most prevalent disinhibition, aggression, aberrant motor behaviour as
f orm of progressive degenerative deme ntia. Its diagnosis well as eating or sl eep behaviour changes [5,6]. All thes e
made by analyzing many biomarkers and test but nowadays symptoms lead to impaired perform ance in family, social
a definitive confirmation requires a post-mortem or professional activities of daily li fe as the diseas e
examination of the patients brain tissue. The purpose of progresses from mild to moderate and to severe. The
this paper is to examine the potential of applying intell igent diagnosis of AD is m ade on clinical grounds and requires
algorithms to the results obtained f rom non-invasive on one hand the confi rmation of a progressive dem entia
analysis methods on suspected patients in order to syndrom e and, on the other, the exclusion of other
contribute to t he improvement of both early diagnosis of AD potential caus es of dem entia by clinical history and
and its degree of severity. This work deals with Emotional examination, complet e blood workup tests and brain-
Response Automatic Analysis (ERAA) based on classical imaging analysis test such as comput er tomography (CT)
and new speec h f eatures: Emotional Temperature (ET) and or magnetic resonance im aging (MR I). In this setting, the
Higuchi Fractal Dimension (FD). Th e method has the great development of non-invasive intelligent diagnosis
advantage of being, in addition to non-invasive, of low cost techniques would be very valuable for the early detection
and without an y side ef f ects. This is a pre-clinic studio
and cl assi fication of di fferent types of dementia,
oriented to validate f uture diagnosis tests and biomarkers.
particularly becaus e they do not require specialized
ERAA showed ver y satisfactory and promising results f or
personnel or l aboratory equipm ent, so that anyone in the
the def inition of f eatures oriented to early diagnosis of AD. habitual envi ronment of pati ent, aft er proper training,
could apply them without altering or blocking the
I. INTRODU CTION patients abilities [7,8]. Emotional Speech Analysis (ES A)
has that potential: emotions are cognitive processes relat ed
Alzheimer's, Dis eas es (AD) is the most comm on type to the architecture of the human mind, such as decision
of dementi a am ong the elderly people with a large, and making, mem ory or attention, clos ely linked to l earning
expected to increas e, socioeconomi c cost to soci ety. It is and understanding that arise in intelligent natural or
characterized by progressive and irreversibl e cognitive arti fici al syst ems when they become necessary to survive
deterioration with memory loss, impaired judgm ent and in a changing and parti ally unpredi ctabl e world [9,10].
language and other cognitive defi cits and behavioural Human interaction includes emotional inform ation about
symptoms that end up becoming s evere enough to limit partners that is transmitted through language explicitly
the ability of an individual to perform professional, social and implicitly through nonverbal communi cation. The
or family activities of daily li fe. As the diseas e progress es nonverbal inform ation, which oft en includes body-
patients develop increasingly severe disabilities to finally language, attitudes, modulations of voi ce, facial
becom e compl ete dependent. An early and accurate expressions, et c., is ess ential in hum an communi cation as
diagnosis of AD would be of much help for pati ents and it has a high effect on the communi cation provision of the
their families both to plan for the future and to start and partners and on the intelligibility of speech [9,10]. Hum an
early treatment of the symptoms of the disease. According emotions are affected by the envi ronm ent, the direct
to current crit eria, the diagnosis is express ed with di fferent interaction with the outside world but also by the
degrees of certainty as possible or probable AD when emotional mem ory em erged from the experi ence of
dementi a is present and other possible causes have been individual and cultural envi ronm ent, the so called
ruled out, but an unambiguous diagnosis of AD requires socialized em otion. Emotions use the sam e components
the demonstration of the typical AD pathological changes subjective, cultural, physiological and behavioural that the
in brain tissue by autopsy [1,2,3,4]. The clini cal hallmark individual's perception express es with regard to the mental
and earliest mani fest ation of AD is episodi c memory state, the body and how it interacts with the envi ronm ent.
impairment. At the time of clini cal pres entation other We have focus ed our work on non-invasive diagnosti c
cognitive deficits are usually already pres ent in their techniques based on the analysis of emotional respons e on
language, executive functions, orient ation, per ceptual speech. Moreover, the emotional respons e in Alzheim er's
abilities and constructional skills. Associat ed behavioural
patients becom es impai red and also s eems to go through
and psychological symptoms include apathy, irritability, different stages. In the early stages, social and even sexual
depression, anxiety, delusions, hallucinations, disinhibition appears and behavioural changes are also
978-1-4799-0830-1/13/$31.00 2013 IEEE 61

K. Lopez-de-Ipina et al. Automatic Analysis of Emotional Response based on Non-Linear Speech Modeling Oriented
observed (for exam ple, being angry and not being abl e to and 50% of the m aterial from the control (CR) and AD
perform common tasks, express themselves or remem ber) groups respectively, is lost. The full dat abas e consisted of
[9,10]. However, the emotional memory rem ains, and they about 60 minut es for the AD group and of about 9 hours
cry more easily and gratefully acknowl edge caresses, for the control. The speech is divided into segm ents of 60
smiles and hugs. The Alzheimer's patient reacts seconds. Finally, a database of about 600 s egments of
aggressively to things that for healthy people are harmless Spontaneous Speech is obtained. The databas e is
and perceives a threat or danger where none exists. In multicultural and multilingual (English, French, Spanish,
more advanced stages they may oft en s eem shy and Basque, Chines e, Arabi an and Portuguese) and with a
apathetic, symptoms oft en attributed to mem ory loss wide range of ages to develop a new methodology
and/or di fficulty in finding the right words and some independent with regard to the cultural, soci al and
respons es are likely to be magni fi ed due to an alteration in language environm ent.
perception. Alternatively, it has been suggest ed that the
reduced ability to feel em otions is due to mem ory loss, B. Methodss
which m ay in turn induce the appearance of apathy and
depression [5,6]. The work presented here is part of a B.1 Feature ext raction
larger studio to identi fy novel technologi es and
biomarkers or features for early AD det ection. The
objective of that studio is the identi fi cation of pre -clini cal B.1.1 Emotional Speech Analysis
AD (prior to developing the first symptoms ) and Some authors affirm that emotions arise in intelligent
prodromi c (early symptoms that might indicate the natural or arti fi cial systems when they become necessary
beginning of AD but before the onset of dem entia). The for survival in a changing and partially unpredictable
purpos e of this work is to evaluat e the suitability of a new world [9,10]. Emotions are cognitive process es related to
approach for early AD diagnosis bas ed on the analysis of the architecture of the human mind (s uch as decision-
classical paramet ers and Emotional Temperature and non- making, memory, or attention) and are clos ely linked to
linear param eters, Fractal Dimension, whos e results are learning and understanding. Hum an interaction includes
susceptibl e to be used for the autom atic classi fi cation of emotional information about com munication partners that
tested individuals. is transmitted through l anguage explicitly and implicitly
through nonverbal communi cation. Nonverbal
The rest of t his paper is organized this way: In Section II, inform ation, which includes e.g. body language, attitudes,
materials and the m ethodology of the experim ents is modulations of voice, and faci al expressions, is ess ential
in human communi cation as it has a substantial impact on
expl ained, Section III shows the experim ent al results, and
the communication provision of the partners and on the
fi nally, con clusions are presented in Section IV. intelligibility of speech [9,10]. Hum an emotions are
affected by the envi ronm ent the direct interaction with
the outside world but also by the em otional memory
II. MATERIALS AND METHODS arising from the experience of the individual and cultural
environm ent, the so-called socialized emotion. Emotions
A. Materi als consist of the s ame com ponents s ubjective, cultural,
physiological and behavioral that influence the
All the work was performed strictly following the individual's perception of mental stat e, the body, and its
ethical consideration of the organi zations involved in the interaction with the envi ronment. Emotions, far from
project. Participants in this study included 20 AD patients being an obstacle in understanding the soci al universe,
(68-96 years of age, 12 women, 8 men), within the three describe it cl early. In this study, we aim to accomplish the
stages of AD First Stage (FS), Second Stage (SS), Third automati c sel ection of emotional s peech by analyzing
Stage (TS), (TS=6, SS=10, FS=4). The uneven the three families of features in speech:
number o f pati ents in the di fferent stages is due to the fact 1. Acousti c features: pitch, standard deviation pitch,
that people visit thei r physi cian when there is al ready max and min pitch, intensity, standard devi ation
advanced cognitive impairm ent (corresponding to SS or intensity, max and min int ensity, period m ean,
TS levels). The reference control group was made up of period st andard deviation, and Root Mean Square
50 healthy parti cipants (10 men and 10 women). 10 + 10= amplitude (RMS);
20 not 50, correct the numbers from 20-98 years of age. 2. Voice quality features: shimmer, local jitter, Noise-
The databas e for the experim entation was obtained aft er to- Harmonics Ratio (NHR ), Harmonics-to -Nois e
recording the cont rol and AD groups for 12 hours and 8 Ratio (HNR) and autocorrel ation;
hours res pectively. The recordings consisted of videos of
3.Duration features: fraction of locally unvoiced
Spontaneous Speech where peopl e tell pl eas ant personal fram es, degree of voice breaks.
stories or feelings and int eract with each other in a
friendly convers ation. The recording atmos phere is The three feature families create EF s et in the
relaxed and non -invasive. The shorter recording tim es for experiment ation.
the AD group are due to the fact that AD patients speak B.1.2 Emotional Temperature
more slowly, with long pauses, taking longer time to We wished to apply a non-invasive method to estimation
occasionally trying to look for the correct word, utter the severity of Alzheimer in the patient. For that, we
speech dis fluencies or break m essages and, in the developed a method, des cribed here for the first time,
advanced stage of the disease, they feel tired and usually based on the analysis of a few prosodic and paralinguistic
want to stop the recording. We compli ed with their features sets obtained from a temporal segm ent ation of the
requests. After audio processing of the video about 20% speech signal.
62
INES 2013 IEEE 17th International Conference on Intelligent Engineering Systems June 19-21, 2013, Costa Rica
fractal systems have a characteristic called sel f-similarity.

An object is self-similar i f a close-up examination o f the
object reveals that it is compos ed of small er versions of
itsel f. Sel f-similarity can be quanti fi ed as a rel ative
measure of the num ber of basi c building blocks that form
a patt ern, and this measure is defined as the F ractal
Dimension. This current work, focus on the alternatives,
which dont need previous modeling of the system,
Higuchi [17]. Higuchi propos ed an algorithm for
measuring the F ractal Dim ension of discret e time
sequences di rectly from tim e s eri es, so the m ethod of the
experiments described in [17].
B.1.4 Feature sets

In the experiment ation four feature set will be us ed: 1)
EF, set described in B.1.1; 2) EF+HFD1: EF set and
Higuchi Fractal Dimension (HFD); 3) EF+HFD2: EF s et,
Figure 1. Higuchi Fractal Dimension for a signal of a HFD, maximum HFD, minimun HFD, vari ance HFD and
person with AD with di fferent window sizes standard devi ation HFD.
B.2 Automatic Classi fication
The "Emotional Temperature" (ET) was cal culated as The automati c cl assi fication of emotional speech is
follows: First, the speech signal is windowed by a based on the Multi Layer Perceptron (MLP). WEKA
hamming window of 0.5 seconds overlapped 50% [11]. In software has been us ed in carrying out the experiments.
each frame x(n) the DC component is rem oved and the Z The results will be evaluated with the R ecognition Error
normali zation is m ade. Two prosodi c and four Rate (%R ER). For the training and validation phas es, we
paralinguistic features rel ated to the pitch and energy used k-fold cross -validation in order to ensure solid
respectively were estimated from each fram e. These results. Cross-validation is a robust validation for variable
features were chos en because their robustness in emotion selection. In this case k=10. These features will define CR
recognition has been proven, they are quickly and easily control group and the three AD levels
cal culat ed, and they are independent of linguistic
segment ation, whi ch helps to avoid problems in real time III. RESULTS
applications in real environments. For prosodic features, a The task was Autom atic Classi fi cation, with the
voiced/unvoiced decision is made to each frame and two classi fi cation targets being: healthy speakers without
linear regression coeffi cients of the pitch contour p(n) [12, neurological pathologies and speakers diagnos ed with AD.
13, 14] are obt ained. For paralinguistic features, voice The experiment ation is carried out with 20 sel ect ed
spect ral energy balances [15] are cal culat ed from each members of the control group (CR ) and 20 AD sufferers at
fram e, and quanti fied using 4 percent ages of energy different stages of the dis eas e (FS=4, SS=12, TS=6). The
concentration in 4 frequency bands. Support Vector control group m embers (10 fem ales and 10 mal es) were
Machines (SVM ) (Chang) have been used to quanti fy the middle-aged (M) or elderly (E), whil e all AD sufferers (10
discriminative ability of the proposed measures. We have fem ales and 10 mal es) were elderly. Engineers and health
used a freely availabl e impl ement ation named LIBSVM speci alists have analyzed %RER for Autom atic Analysis
[16] with a radial basis kernel function.
of Emotional Response res ults with regard to: global
results and also AD level res ults. Table 1 shows the
B.1.3 Higuchi Fractal Dim ension obtained results with di fferent Neuron Number in Hidden
When appropriat e corpora are avail able, linear systems Layer (NNHL) and Training St ep (TS). The best results
can be impl ement ed fairly rapidly, as they rely on well- are obtained for SSF+HFD2+ET set with 150 NNHL. In
some cas es there is model saturation.
known Machine Learning techniques to achieve their
goals, avoiding compl ex adjustm ents to the syst em. These
latter types of t asks oft en require experiment ation with Table 1. Global Accuracy (%) with MLP for E motional
alternative t echniques, which can l ead to improved Response Analysis and different Neuron Number in Hidden
systems. One such alternative technique of particul ar Layer (NNHL) and Training Step (TS)
interest is nonlinear analysis, and som e works show that NNHL TS EF SSF+ HFD1 SSF+ HFD2 SSF+ HFD2+ET
combining nonlinear features with linear can produce 50 500 89.92 86.82 93 96.9
higher recognition accuraci es without substituting the 1000 90.69 89.14 93 96.9
whole linear system with novel nonlinear approaches. 1500 90.69 89.9 93 96.9
This is especi ally promising for solving non -typical tasks, 100 500 90.69 87.59 93.8 96,9
since it would be very demanding to design a compl ete 1000 90.69 89.14 93.8 96.1
nonlinear system from s cratch for s olving a t ask al ready 1500 90.69 89.14 93.8 96.9
made di fficult by the scarcity of resources. The Fractal 150 500 90.69 86.04 93.8 97,7
Dimension is one of the most popul ar features, whi ch 1000 90.69 86.82 93.8 97.7
1500 90.69 86.82 93.8 97.7
describe the complexity of a syst em. Most i f not all of the
63
K. Lopez-de-Ipina et al. Automatic Analysis of Emotional Response based on Non-Linear Speech Modeling Oriented
Figure 2. %Re cognition Error Rate (%RER) with MLP for di fferent classes: CR, FS , S S and TS
1. Global system results: The results are s atis factory for

this studio. The new fractal features improve the R EFEREN CE S
system, being SSF+HFD2+ET the best option. This [1] Sociedad Espaola de Neurolo ga, http://www.sen.es/
feature set includes HF D, its detailed vari ations and [2] Mc Kahn G, et al.. Clin ical diagnosis of AD: report of the
Emotional Temperature, which are able to model non- NINCDS-ADRDA Wor kgroup on AD. (1984); 24 :939-94 4.
linear signal features (Table 1). [3] McKhann GM et al.. The diagnosis of dementia due to
2. Classes results: SSF+HFD2+TE set obt ains the best Alzheimers disease: Recommendations from the NIAA
Association wor kgroups on d iagnostic guidelines for AD.
results for all classes (Figure 2). This set improves also Alzheimers Dement. (2011) May ;7(3):263-9.
the cl assi fication with regard to early det ection (FS [4] Van de Pole, L.A., et al., The effects of age and Alzheimer's
class). SS has also better rate to discriminate middle disease on hippocampal volumes, a M RI stu dy . Alzheimer's and
AD level. The model is abl e also to discriminate Dementia, 2005. 1(1, Su pplement 1): p. 51 .
pathological and non-pathologi cal s egm ents in each [5] Morris JC, The Clin ical Dementia Rating (CDR) : current version
patient. and scoring rules. Neurolo gy , (1993). 43: p. 2 412b-241 4b.
Health specialists not e the relevance of the syst em's [6] American Psy chiatric Association. Diagno stic and Statistical
ability to carry out both the analysis of independent Manual of Mental disorders, 4th Edition Text Revision.
Washington D C. (2000).
biomarkers as Emotional Response, and/or the integral
[7] Marcos Faundez-Zanuy et al. Biometric Applications Related to
analysis of several biomarkers.
Human Beings: There Is Life bey ond Security , Cognitive
Computation , (2012), DOI 10.1 007/s12559-01 2-9169-9
CONCLUDING REM ARK [8] K. Lpez de Ipia et al. A lzheimer Disease Diagno sis based on
Automatic Spontaneou s Speech Analy sis, Proc. of NCTA,
The main goal of the present proj ect is feature search in ICNNT.SS in Challenges in Neuroengineering. Barcelona, (2012).
Emotional Response ori ented to pre-clinical evaluation [9] M. L. Knapp. E ssentials of nonverbal communication. H olt,
for the definition of test for AD diagnosis. These features Rinehart & Winsto n (1980).
are of great rel evance for health speci alists to define [10] Cowie, E. et al. Emotion Recognition in Human-Computer
Interaction. IEEE Sig nal Processing Maga zine, Vo l 18(1). Pp. 3 2-
health people and the three AD l evels. The approach of 80 (2001).
this work is to improve the previous modelling bas ed on [11] Petrushin V.A. Emotion in speech: recognitio n and application to
Emotional Response feat ures with Fract al Dim ensions. call centers. Proceedings, CANNE (ANNIE). 1999; 7-10
More precis ely, an impl em entation of Hi guchis [12] Lee C.M., Narayanan S. Emotion recognition us ing a data -driven
algorithm in order to add this new feature to the set that fuzzy interference sy stem. Pro., 8th ECSCT. 2 003; 157-160.
feeds the training process of the model. In this work, it is [13] Kwon OW., Chan K., Hao J., Lee TW. Emotion recognitio n by
speech signals. 8th European Conference on Speech
described a first approach to the in clusion of nonlinear Communication and Technology . 2003 ; 125-12 8
features. This straight fo rward approach mi ght be robust [14] De Cheveign A., Kawahara H. YIN, a fundamental frequency
in terms of capturing the dynamics of the whole estimator for speech and music. Journal of the Acoustical Society
of America, 2002; 111 (4):1917-193 0.
waveform, and it offers m any advantages i n terms of
[15] Alonso J., De Len J., Alonso I., Ferrer MA. Automatic detection
comput ability, and it also makes easier to comp are the of patholog ies in the voice by HOS base parameters. Journal on
power of the new feat ures against the previous ones. In Applied Signal Processing. 2001; 4:27 5-284.
future works we will introduce new features relatives to [16] Chang CC, Lin CJ. LIBSVM : a library for support vector
machines; 2001.
Emotional Respons e and speech modelling and we will
[17] Higuchi T. Approach to an irregular time series on the basis of the
also model Fract al Dim ension by other al gorithms. fractal theory . Phy sica D (1988). 31277:283.
64

NonLinear Analysis of Emotional Voice

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

NonLinear Analysis of Emotional Voice

Caricato da

Copyright:

Formati disponibili

INES 2013 IEEE 17th International Conference on Intelligent Engineering Systems June 19-21, 2013, Costa Rica

Automatic Analysis of Emotional Response based

978-1-4799-0830-1/13/$31.00 2013 IEEE 61

fractal systems have a characteristic called sel f-similarity.

B.1.4 Feature sets

1. Global system results: The results are s atis factory for

Potrebbero piacerti anche