Sei sulla pagina 1di 15

ARTICLE IN PRESS

Phasegram Analysis of Vocal Fold Vibration


Documented With Laryngeal High-speed
Video Endoscopy
*Christian T. Herbst, †Jakob Unger, ‡Hanspeter Herzel, *Jan G. Švec, and §Jörg Lohscheller, *Olomouc, Czech
Republic; †Aachen, Germany; ‡Berlin, Germany; and §Trier, Germany

Summary: Introduction. In a recent publication, the phasegram, a bifurcation diagram over time, has been intro-
duced as an intuitive visualization tool for assessing the vibratory states of oscillating systems. Here, this nonlinear
dynamics approach is augmented with quantitative analysis parameters, and it is applied to clinical laryngeal high-
speed video (HSV) endoscopic recordings of healthy and pathological phonations.
Methods. HSV data from a total of 73 females diagnosed as healthy (n = 42), or with functional dysphonia (n = 15)
or with unilateral vocal fold paralysis (n = 16), were quantitatively analyzed. Glottal area waveforms (GAW) and left
and right hemi-GAWs (hGAW) were extracted from the HSV recordings. Based on Poincaré sections through phase
space-embedded signals, two novel quantitative parameters were computed: the phasegram entropy (PE) and the phasegram
complexity estimate (PCE), inspired by signal entropy and correlation dimension computation, respectively.
Results. Both PE and PCE assumed higher average values (suggesting more irregular vibrations) for the pathologi-
cal as compared with the healthy participants, thus significantly discriminating healthy group from the paralysis group
(P = 0.02 for both PE and PCE). Comparisons of individual PE or PCE data for the left and the right hGAW within
each subject resulted in asymmetry measures for the regularity of vocal fold vibration. The PCE-based asymmetry measure
revealed significant differences between the healthy group and the paralysis group (P = 0.03).
Conclusions. Quantitative phasegram analysis of GAW and hGAW data is a promising tool for the automated pro-
cessing of HSV data in research and in clinical practice.
Keywords: phasegram–nonlinear analysis–periodicity–high-speed video endoscopy–glottal area waveform.

INTRODUCTION duced in the domain of mathematics and physics, for instance


The behavior of a vibratory system is periodic if the observed the correlation dimension,16 Lyapunov exponents,17 or Tokuda
oscillatory pattern continuously repeats itself after a constant time et al’s low-dimensional nonlinearity measure.18 These methods
interval. Periodicity abiding this strict definition is hardly ob- have been successfully applied during the analysis of biosignals
served in empirical data of biomechanical systems such as the from both healthy and pathological voices, such as the acous-
voice. Rather, voice production is at best a nearly periodic1 phe- tical waveform,12,19–22 electroglottography,23–25 or data derived from
nomenon under nonpathological conditions. In the presence of high-speed video (HSV) recordings of vocal fold vibration.26–29
a voice disorder, vocal fold vibration and thus the generated acous- The detailed interpretation of available quantitative methods
tical output is likely to be more or less perturbed,2 often caused for analyzing the dynamics of irregular voice often requires expert
by highly irregular vibratory regimes of the vocal folds.3–6 background knowledge in mathematics and physics. In con-
Deviations of periodicity can be quantified in a variety of ways. trast, visualization methods are often easier to understand for
Apart from time domain-based and frequency domain-based ap- nonexperts. Such visualization methods, applied to nonperiodic
proaches such as calculation of jitter7 or the harmonics-to- voice production, include for example spectrograms12,30,31 or local
noise ratio,8 methods from nonlinear systems analysis have maxima displays.32
received growing interest in the past decades.9–12 In non-linear Recently, a novel visualization method of system dynamics
dynamics methods, the voice is considered to be a dynamical has been introduced: the phasegram.33 In a phasegram, time is
system13 that is able to exhibit a wide variety of oscillatory be- mapped onto the x-axis, and various vibratory regimes, such as
havior “on the way to chaos.”14,15 periodic oscillation, subharmonics, or chaos, are identified within
Several quantitative methods for assessing the complexity of the generated graph by the number and the stability of horizon-
the temporal behavior of nonlinear systems have been intro- tal lines. Phasegrams can be interpreted as bifurcation diagrams
in time. They are particularly suited for nonstationary signals.
Accepted for publication November 12, 2015.
The benefits of sliding window analysis are combined with the
From the *Voice Research Laboratory, Department of Biophysics, Faculty of Science, visualization potential of phase space embedding.34,35 In con-
Palacký University Olomouc, Tr. 17. listopadu 12, 771 46 Olomouc, Czech Republic; †In-
stitute of Imaging & Computer Vision, RWTH Aachen University, Kopernikusstr. 16, 52074
trast to other nonlinear analysis techniques (eg, bifurcation maps),
Aachen, Germany; ‡Institute for Theoretical Biology, Humboldt University Berlin, phasegrams can be automatically constructed from a time domain
Invalidenstraße 43, 10115 Berlin, Germany; and the §Department of Computer Science,
University of Applied Sciences, Schneidershof, 54293 Trier, Germany.
signal alone, no additional system parameter needs to be known.
Address correspondence and reprint requests to Christian T. Herbst, Voice Research In contrast to conventional voice perturbation measures (eg, jitter),
Laboratory, Department of Biophysics, Faculty of Science, Palacký University Olomouc,
Tr. 17. listopadu 12, 771 46 Olomouc, Czech Republic. E-mail: herbst@ccrma.stanford.edu
no information about glottal cycle duration or fundamental fre-
Journal of Voice, Vol. ■■, No. ■■, pp. ■■-■■ quency needs to be known.
0892-1997
© 2015 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Phasegrams have thus far been utilized for the visualization33
http://dx.doi.org/10.1016/j.jvoice.2015.11.006 and the manual classification36 of electroglottographic voice
ARTICLE IN PRESS
2 Journal of Voice, Vol. ■■, No. ■■, 2015

signals. Here, their application to the analysis of time series data recordings, the time-varying glottal edges enveloping the glottal
derived from HSV recordings is introduced by example of simu- area were segmented within each video frame, thus resulting in
lated vocal fold vibrations using a lumped element biomechanical the glottal area waveform (GAW) in pixels. This segmentation
model. The concept is further extended to healthy and patho- process also provides information on the deflection of the medial
logical phonations, considering both stationary and nonstationary edges of both vocal folds, describing their mediolateral vibra-
signals. The analysis is complemented by spatiotemporal tion patterns. The segmentation information can also be utilized
visualization37,38 and of Fourier analysis of vocal fold vibration to compute the left and the right hemi-GAW (hGAWL and
and of simultaneously acquired acoustical signals. It will be shown hGAW R , respectively). These quantities
that sequences of aberrant vocal fold vibratory behavior can be are defined as the time-varying area (expressed in pixels)
easily located in phasegrams, thus earmarking the method as a of the left and the right half of the glottis, respectively, as seg-
promising candidate for detection of clinically relevant pas- mented along the anteroposterior midline, 40 satisfying
sages within HSV recordings. To facilitate automated objective hGAWL + hGAWR = GAW. In this study, both the GAW and the
analysis of vocal fold vibratory behavior (as seen in HSV re- hGAWL,R are being analyzed.
cordings), two novel quantitative analysis parameters derived from For visual inspection, information on the extracted time-
the phasegram visualization are introduced in this paper. The per- varying glottal edges was used to create phonovibrograms (PVG)38
formance of these quantitative parameters is assessed through and glottovibrograms (GVG),29,41 two visualization techniques
analysis of a database containing HSV recordings of healthy and that transfer information on the time-varying lateral deflection
pathological phonations. of the vocal folds (as color information) along the anterior-
posterior (A-P) dimension into a single graph. These two
METHODS visualization techniques were utilized in Figure 3 (GVG) and
in Figure 4 (PVG) in this text.
Participants and phonatory tasks
A total of 73 female participants were included in the study.
Before data acquisition, all participants underwent a standard Vocal fold vibration simulation
clinical evaluation. Forty-two of these were considered to be As a proof of concept for the later described phasegram anal-
normophonic (ie, healthy) speakers. Another 15 participants were ysis of clinical HSV data from human phonation, a synthesized
diagnosed with functional dysphonia, and the remaining 16 were HSV containing three stereotypical modes of vibration (peri-
diagnosed with unilateral vocal fold paralysis. The average (±stan- odic, period doubling, and chaos) was generated using a simplified
dard deviation) age of these clinical groups was 40.2 ± 15.8 years two-mass model simulation approach.42 This model was in-
(healthy), 46.2 ± 16.1 years (functional dysphonia), and cluded in this paper for didactic purposes, to illustrate the
53.7 ± 23.2 years (unilateral vocal fold paralysis), respectively. interrelation between biomechanical vibratory regimes of the vocal
All participants were asked to steadily phonate the sustained vowel folds and their appearance in the phasegram, and to verify the
/ae/ at habitual pitch and loudness during endoscopy. These pho- systematic variations of the new measures introduced in this work
nations constituted the corpus of stationary data that was utilized in a controlled environment with known input data. The model
for statistical analysis in this study. One glissando produced by was configured with the parameters specified in Table 1. The
a female with left vocal fold paralysis is also discussed in this simulation was run for 5 seconds at a sampling frequency of
paper, illustrating the phasegram’s potential to visualize 4000 Hz (corresponding to the video frame rate of the HSV re-
nonstationary phonation (Figure 4). cordings analyzed in this study), driven by subglottal pressure
that gradually varied from 0 to 2 kPa and back, reaching a
maximum plateau at t = 2 s to t = 3 s. For both vocal folds, the
Data acquisition
HSV data of vocal fold vibration during phonation was ac-
quired using an HS Endocam 5562 high-speed camera system TABLE 1.
(Richard Wolf GmbH, Knittlingen, Germany) operated at 4000 Two-mass Model42 Parameters Used for Generating a
frames per second with a spatial resolution of 256 × 256 pixels. Vocal Fold Vibration Sequence With Three Stereotypi-
The system was equipped with a 9-mm rigid endoscope with cal Modes of Vibration (Periodic, Period Doubling, Chaos)
90° optics (Richard Wolf GmbH, Knittlingen, Germany), and Parameter Description Value
a Wolf 5123 Auto LP 250 Watt Xenon-light lamp (Richard Wolf
q Asymmetry parameter 0.644
GmbH, Knittlingen, Germany) served as the light source. Each
l Vocal fold length [cm] 1.739
participant’s phonation was recorded for 250 ms, resulting in 1000
m1 Lower mass [g] 1.032
video frames per participant, and a total of 83,000 analyzed video m2 Upper mass [g] 0.353
frames. k1 Spring stiffness of lower mass 0.161
[cm/ms2]
HSV analysis k2 Spring stiffness of lower mass 0.031
To enable quantitative analysis of the vibrating patterns along [cm/ms2]
the entire length of the vocal folds, a clinically evaluated image- kc Coupling stiffness 0.152
d1 Thickness of lower mass [cm] 1.687
processing procedure was utilized, which is described in detail
d2 Thickness of upper mass [cm] 1.636
in Reference 39. When applying this method to all acquired HSV
ARTICLE IN PRESS
ChristianT. Herbst et al Phasegram analysis of laryngeal high-speed video recordings 3

time-varying displacement of either the upper or the lower mass typically indicate the system state at a particular point in
(always considering the smaller of the two respective mass de- time:
flections, to arrive at the projected glottal area) was interpreted
as the simulated maximum oscillatory deflection A[i] of a simple • one line: no oscillation (stasis)
string model of vocal fold vibration with one vibratory mode • two lines: periodic oscillation (Figure 1G, t = 0.5 s)
(Supplementary Movie M1). • more than two stable lines: subharmonic vibration
(Figure 1G, t = 2 s)
Phasegram generation • absence of continuous lines, rugged appearance: irregu-
The phasegram generation process is described in detail in a pre- lar oscillatory patterns (Figure 1G, t = 1.5 s)
vious publication. 33 Here, an outline of the procedure is
recapitulated for the analysis of HSV data. Initially, the GAW Note that these results can be expected only if the analyzed
signals extracted from all analyzed HSV recordings were con- time series has exactly two zero-crossings per cycle (one pos-
verted into complex analytic signals by means of a Hilbert itive and one negative), as is for example the case in
transform.43 For creating phasegrams, consecutive portions of electroglottographic signals or in the GAW data analyzed in
each analytic signal were embedded in two-dimensional phase this study. If, however, the waveform contains more than two
space,34,35 resulting in a number of temporally consecutive phase zero-crossings per cycle (eg, typically seen in the acoustic
portraits equally distributed over the duration of each analyzed signal) due to the presence of noteworthy harmonics, the un-
signal (See Figure 1D for three examples extracted at t = 0.5 s, derlying Poincaré section will result in proportionately more
t = 1.5 s, and t = 2 s). Each two-dimensional phase portrait was intersection points with the phase space trajectory. In such a
intersected by two so-called Poincaré sections (cf., Reference case, more than the above-mentioned stable lines will be seen
44 figure IV.1) starting at the phase space origin and going ra- for periodic and subharmonic oscillatory regimes, and thus the
dially outward at opposite angles π radians (or 180°) apart—see derived quantitative measures phasegram entropy (PE) and
equation S1 in supplementary materials to Herbst et al.33 Two phasegram complexity estimate (PCE) (below) will be slightly
connected Poincaré sections thus form a straight line through inflated.
the origin of the phase space, resulting in a certain number of All phasegrams utilized for statistical evaluation were calcu-
intersection points between the embedded phase space trajec- lated with a window duration of 100 ms, a read progress of 5 ms
tory and the intersection line. (allowing for a considerable overlap of consecutive windows,
The angle of the Poincaré sections through the correspond- thus increasing temporal accuracy), 100 angular bins (parame-
ing phase portraits crucially influences the final appearance of ter M in Herbst et al33), and 128 histogram bins for assessing
the generated phasegram. The optimal Poincaré section angle, intersections between phase space trajectories and Poincaré sec-
ideally capturing the full complexity of the underlying vibra- tions. All data analysis algorithms used in this study were
tory phenomenon, is determined by an automated algorithm that implemented by author C.T.H. in the programming language
is described in detail in the supplementary materials of Herbst Python. The source code for creating phasegrams is available
et al33 and in Appendix 1. This algorithm divides all two- online (www.phasegram.org).
dimensional phase spaces derived from each analyzed signal
(recall Figure 1D) into M equally spaced angular sections, and Quantitative analysis
computes the complexity of all the M resulting one-dimensional Phasegram Entropy (PE)
Poincaré section vectors with an approach inspired by Grassberger As illustrated in Figure 1, the phasegram is composed of a series
and Procaccia.16 For each of these vectors, the complexity es- of intensity-coded histograms of Poincaré sections through con-
timate is averaged over all extracted phase portraits, and the secutive two-dimensional phase space embeddings of portions
Poincaré section angle that results in the highest averaged com- of the analyzed signal. As the underlying signal gets more
plexity estimate is chosen for the generation of the phasegram. complex, the intersections between the phase space trajectory
Figure 1D shows three horizontally aligned intersection lines and the respective Poincaré section are distributed over a greater
within the two-dimensional phase space. The occurrence of in- number of histogram bins (see Figure 1E for stereotypical cases
tersection points along these lines was assessed by means of of periodic, subharmonic, and irregular vibration). Based on Shan-
histograms with a given number of histogram bins (Figure 1E). non’s work,45 the complexity of the analyzed signal can thus be
Each histogram was grayscale coded and thus converted into a assessed by calculating the entropy of the Poincaré section his-
so-called “trajectory strip” (Figure 1F). The resulting trajecto- tograms. The entropy H, measured in bits, is a measure for the
ry strips were then in turn rotated by 90 degrees and consecutively information content associated with the outcome of a random
plotted onto a graph at equally spaced time intervals to form the variable. It is calculated as
phasegram (Figure 1G). As a by-product of phasegram gener-
ation, so-called “phase portrait movies” were created, which show n

the evolution of the phase space trajectory over time (Supple- H = −∑ pi log2 ( pi ) (1)
i=1
mentary Movie M1).
In the phasegram, time is mapped onto the x-axis. Based where pi is the probability of the occurrence (between 0 and 1)
on the underlying attractor in two-dimensional phase space, of a bin in the histogram. In the case of phasegrams, pi is the
the number and the stability of lines perpendicular to the y-axis number of trajectory intersections with the Poincaré section found
ARTICLE IN PRESS
4 Journal of Voice, Vol. ■■, No. ■■, 2015

FIGURE 1. Illustration of phasegram generation, exemplified by analysis of vocal fold vibration simulated with a simplified two-mass model
(nonvibrating intervals not shown). (A) Narrowband spectrogram (FFT window duration 185.8 ms) of a GAW signal generated with the Steinecke-
Herzel two-mass model driven in a pressure sweep. (B) GAW signal derived from the model. (C) Three portions of the GAW signal, extracted at
temporal offsets t = 0.5 s (periodic), t = 1.5 s (irregular), and t = 2 s (subharmonic). (D) Two-dimensional phase space embedding of the above signals,
created by Hilbert transformation and attractor reconstruction. An intersection line (Poincaré section) was created along the x-axis (orange/light
gray), yielding intersection points with the trajectory (red/gray dots). (E) Histograms of trajectory intersection points with intersection lines
for all three extracted signal portions. (F) “Trajectory strips”: color-coded histograms of intersection lines through phase portraits; (G) phasegram
from the signal displayed in (A) and (B). The markers at t = 0.5 s, t = 1.5 s, and t = 2 s represent the temporal offset of the three trajectory
strips from (F) within the graph. (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this
article.)
ARTICLE IN PRESS
ChristianT. Herbst et al Phasegram analysis of laryngeal high-speed video recordings 5

in a given histogram bin (recall Figure 1E), divided by the total Phasegram Complexity Estimate (PCE)
number of intersections (found in all bins) of the analyzed As indicated earlier, phasegrams are generated by creating a (one-
histogram. dimensional) Poincaré section through two-dimensional phase
As indicated above, for phasegram generation, the definition space. The optimal angle of that Poincaré section is deter-
of a histogram follows the convention that a Poincaré section mined by a complexity estimate described in Appendix 1 and
of an oscillating phase space trajectory is typically made through in the supplementary materials of Herbst et al.33 This complex-
its positive semi-orbit. Consequently, histograms of Poincaré sec- ity measure, hereafter termed PCE, is a dimensionless quantity
tions are taken only from the phase space origin outward at a that is theoretically expected to be in the range of zero to one.
certain angle, and the resulting phasegram representation is con- As a proof of concept, the PCE for a signal synthesized from
stituted by two Poincaré section histograms taken at opposite the logistic map equation, having well-known oscillatory prop-
angles. For entropy calculation, the average entropy from both erties, is shown in Figure 2D. For a nonperturbed, purely periodic
these respective histograms is calculated, and the resulting quan- oscillatory regime, the PCE is zero. Subharmonic vibratory
tity is introduced as the PE, measured in bits, which is determined regimes result in values of about 0.3 or greater (Appendix 1).
for every point in time t where a phase space was extracted from As suggested in Appendix 2, a signal subjected to PCE cal-
the analyzed signal: culation should ideally contain 20 vibratory cycles or more, for
the full complexity of the underlying phase space trajectory to
HΘ + HΘ+π emerge. If, on the other hand, the analysis window (and thus the
PE[t ] = , (2)
2 duration of the segment extracted from the analyzed time series
where Hθ and Hθ+π are the entropies of the two Poincaré section based on which the consecutive phase space embeddings are con-
structed) is too long, abrupt transitions, bifurcations, or other
histograms, taken at angles of θ and θ + π radians, respective-
perturbations of the input signal would be “smeared out” over
ly, within the phase space at time index t. The time index t is
time, decreasing the temporal accuracy of the analysis (just as
the progressive count of phase space embeddings within the ana- in the spectrogram), and potentially inflating the respective PCE
lyzed signal. readings. As a compromise, the window duration in this anal-
The PE for a synthesized signal with known oscillatory regimes ysis was chosen to be 100 ms. Such a window duration would
is illustrated in Figure 2C. The PE for perfectly sinusoidal vi- encompass 20 glottal cycles of phonation at the average speak-
bration is zero, because only one intersection per histogram is ing fundamental frequency of females, that is, 200 Hz.2 It would
found (Figure 2C, t ≈ 0–3 s). Note though that even in this per- furthermore result in reasonably reliable PCE values even for
fectly periodic segment PE[t] was non-zero at a few instances, potentially pathological female phonations at fundamental fre-
caused by the gradually changing amplitude of the synthesized quencies as low as 100 Hz, resulting in 10 cycles per analysis
signal, distributing the respective trajectory intersections over window (Figure 8 suggests that a window duration that encom-
more than one histogram bin. For a period 2 subharmonic os- passes 10 cycles of phonation would still result in a PCE of about
cillation (period doubling), PE = 1 bit (t ≈ 3–6.9 s), and for a 0.6 at a completely irregular oscillatory regime).
period 4 subharmonic oscillation (period quadrupling), PE = 2 In analogy to the PE, the PCE is computed as a time-varying
bits. For the chaotic signal portion in this example, the PE reaches quantity for every point in time t where a phase space was ex-
a value of about 4 bits. For sufficiently large window sizes, tracted from the analyzed signal. For convenience, the time-
maximum entropy PEMAX is reached for a uniform histogram rep- varying quantities PE[t] and PCE[t] and the related asymmetry
resenting highly irregular vibrations as measures (AM) (Eqs. 4 and 5 below) are indicated without the
PE MAX = log2 (numHistBins) (3) trailing index [t] throughout this manuscript, unless their average
over an entire phonation was calculated, in which case they are
Finite sample effects may however lead to an underestima- explicitly indicated as averaged values (eg, “mean PE”), or they
tion of the entropy.46 are decorated with an overline (eg, PE ).
As mentioned above, the PE has a tendency to be inflated
during gradual or abrupt changes in the amplitude or in the quality Asymmetry Measures (AM)
of the analyzed signal. Two such extreme cases occur in Figure 2C In this study, phasegrams were created from both the GAW and
at t = 2.94 and t = 6.9 s, where bifurcations cause abrupt changes the hGAWL,R time series data of all analyzed phonations. Ac-
in the oscillatory regime. As the created phase portrait always cordingly, the novel PE and PCE measures were available for
relies on a time window with a certain duration (20 ms in the the GAW and the hGAWL,R signals. When comparing either the
case of Figure 2), at a bifurcation both the previous and the newly PE or the PCE of hGAWL,R, the difference between these mea-
emerging oscillatory regimes can be simultaneously visible in sures constitutes an indicator of the time-varying symmetry of
the phase portrait, and abrupt amplitude changes cause a grad- the vibratory complexity of the left and the right vocal folds.
ually changing trajectory in phase space. This is illustrated in The respective AM were defined as
Figure 2A by the phase portrait extracted at t = 2.94. Creating
a Poincaré section through such a phase portrait will necessar-
AM ( PE )[t ] =
(
abs PE (hGAW L )[t ] − PE (hGAW R )[t ] ), (4)
ily result in a more complex Poincaré section histogram with
log2 (numHistBins)
increased entropy, because an increased number of histogram
bins has values other than zero. and
ARTICLE IN PRESS
6 Journal of Voice, Vol. ■■, No. ■■, 2015

FIGURE 2. Phasegram entropy (PE) and phasegram complexity estimate (PCE) for three stereotypical vibratory regimes (periodic, subharmonic,
irregular/chaotic), illustrated by synthetic data derived from the logistic map equation, where the parameter a was varied gradually from 3.425 to
3.625—see Herbst et al33 for details. (A) phase portraits, extracted as 2 s, 2.94 s, 4 s, and 8.5 s. Poincaré section angle = 0.22 π radians (dashed
red lines); (B) phasegram of synthetic sequence generated with the logistic map equation; (C) and (D) time-varying PE [bits] and PCE, respec-
tively, derived from the individual Poincaré section histograms of the phasegram in (B)—see text. (For interpretation of the references to color in
this figure legend, the reader is referred to the Web version of this article.)

(
AM ( PCE )[t ] = abs PCE (hGAW L )[t ] − PCE (hGAW R )[t ] , ) is computed. In future investigations, nonabsolute values (without
the abs[. . .] function) might be calculated, resulting in a pa-
(5) rameter in the range of [−1..1] that could be potentially useful
Both these measures are dimensionless quantities in the range for assessing which vocal fold (left or right) exhibits a more
of [0..1]: The difference between the PE of hGAWL and the PE complex vibratory behavior in relation to the other.
of hGAWR is divided by the theoretical maximum (dependent
on the number of histogram bins, see Equation 3), and the the- Statistical analysis
oretically expected maximum PCE value is 1 to begin with To test the performance of the newly introduced measures PE,
(Appendix 1), so no normalization is required. Low AM values PCE, AM(PE), and AM(PCE), these measures were computed
(of either PE or PCE) are to be expected when both vocal folds for the phonations of all 73 female participants described in a
exhibit the same periodicity of vibration, that is, either both vocal previous section. As these phonations were stationary, the re-
folds vibrate nearly periodically (regardless of their individual spective measures were averaged over each investigated time
fundamental frequency of vibration), or when both vocal folds window, thus resulting in the mean quantities PE , PCE ,
have equally irregular vibratory patterns. Higher AM values are AM ( PE ), and AM ( PCE ). The discriminative power of these
expected when one vocal fold vibrates nearly periodically, whereas measures per diagnosis group (normal, functional dysphonia, pa-
the other exhibits an irregular vibratory pattern. Note that in Equa- ralysis) was assessed with inferential statistics. Before testing,
tions 4 and 5, the modulus (ie, the absolute value) of the difference Normality and homogeneity of variance were assessed with
ARTICLE IN PRESS
ChristianT. Herbst et al Phasegram analysis of laryngeal high-speed video recordings 7

Shapiro-Wilk and Levene’s tests, respectively. The PE mea- Irregular vocal fold vibration
sures were subjected to an ANOVA and post hoc pairwise In the irregular case (Figure 3, right panels), both the acoustic
comparisons with Bonferroni-Holm-adjusted t tests.47 The PCE waveform and the GAW did not exhibit a periodic structure. The
measures did not meet the normality criterion in the “healthy” same was true for the GVG, where no apparent regularity in the
group, and the asymmetry measures AM ( PE ) and AM ( PCE ) vibratory pattern of the vocal folds could be discerned. Conse-
were not normally distributed in the “healthy” group and in the quently, the power spectrum analysis of both the GAW and the
“paralysis” group. Consequently, these three measures were sub- acoustic waveform was not constituted by a harmonic series, and
jected to nonparametric analysis constituted by Kruskal-Wallis the GAW trajectory in the two-dimensional phase space had
tests, followed by post hoc Mann-Whitney U tests with no apparent repetitive pattern. The phasegram did not exhibit
Bonferroni-Holm correction. All statistical processing was per- stable horizontal lines, thus also suggesting an irregular vibra-
formed using the free software R.48 tory pattern.

RESULTS Glissando
In contrast to the examples presented in Figure 3, which were
Qualitative analysis—typical examples
derived from the respective participants’ attempts to create sta-
Three stereotypical vocal fold vibratory regimes (periodic,
tionary phonation, a nonstationary case stemming from a glissando
subharmonic, and irregular), generated through attempts at stable
phonation by a 50-year-old female diagnosed with left vocal fold
phonation by a normophonic female and two females with vocal
paralysis is illustrated in Figure 4. A perfunctory analysis of the
fold paralysis, respectively, are illustrated in Figure 3.
spectrogram (Figure 4A) would create the impression of a har-
monic series with nonuniform amplitudes of the individual
Periodic vocal fold vibration harmonics at the initial two thirds of the phonation, suggesting
The periodic case (Figure 3, left panels) was characterized by both a subharmonic pattern. Closer inspection reveals however that
a regular acoustic waveform and a nearly periodic GAW, having the individual partials from t = 0 ms to about t = 400 ms are not
a period of about 4.1 ms and thus a fundamental frequency of about all integer multiples of a common fundamental frequency, but
243.9 Hz. The GVG suggests a steady and repeatable vibratory combinations of two independent frequencies, as seen in
pattern with a posterior-anterior (P-A) phase delay in the opening biphonation.49,50 Between about t = 400 and t = 600 ms, the spec-
phase and an A-P phase delay in the closing phase. Because of trogram suggests a more regular distribution of the individual
the regular nature of both the acoustic waveform and the GAW, energy components. At around t = 600 ms, a true harmonic series
the power spectra derived from these signals consisted of a har- emerged. The PVG (Figure 4B) and the hGAWs (Figure 4C and
monic series with harmonics at integer multiples of the fundamental D) selected at t = 200 ms, 480 ms, and 660 ms, respectively,
frequency. Two-dimensional phase space embedding of the GAW suggest three different vibratory states. Around t = 200 ms, the
resulted in a typical limit cycle. Consequently, the phasegram derived left vocal fold had a heavily perturbed vibratory pattern, and the
from the GAW consisted of two stable lines, corroborating the right vocal fold vibrated with an amplitude modulated pattern.
periodic nature of the vocal fold vibration. Around t = 480 ms, both vocal folds had perturbed vibratory pat-
terns resembling nearly periodic oscillation (the vibration of the
Subharmonic vocal fold vibration right vocal fold was amplitude modulated), and their respec-
In the subharmonic case (Figure 3, middle panels), both the acous- tive frequencies (about 207 and 277 Hz) were most likely locked
tic waveform and the GAW had alternating cycle amplitudes, at a ratio of 3:4. The data for the sequence at t = 660 ms suggest
resulting in a period of about 10.4 ms and a cycle duration of that both vocal folds exhibited slightly perturbed and/or amplitude-
about 5.2 ms (because each period consists of two glottal cycles). modulated nearly periodic vibration at the same frequency of
The subharmonic nature of this case was also borne by the GVG, about 235 Hz (1:1 frequency ratio). All these vibratory pat-
which demonstrated alternating vibratory patterns of vocal fold terns are illustrated by the hGAW phasegrams (Figure 4E and
vibration. Every even cycle had a more pronounced A-P phase F). The heavily perturbed sequence around t = 200 ms was char-
delay in the opening phase than the odd cycles, as well as a slight acterized by the absence of uniform lines in the phasegram; the
anterior opening that was maintained until the very end of the 3:4 pattern at t = 480 ms resulted in a more concentrated cluster
open phase. As in the periodic case, the power spectrum of both of vertical lines, at least in the hGAWL; and the 1:1 synchro-
the GAW and the acoustic waveform exhibited a harmonic series, nized sequence at t = 660 ms produced two almost-stable
but with a considerably lower fundamental at about 95.9 Hz, horizontal lines, as was expected for the nearly periodic case
caused by the period doubling. The second harmonic was con- (Methods and Figure 1).
siderably stronger than the others, constituted by the vibratory The vibratory analysis suggests that the apparent harmonic
period of each cycle. Phase space embedding of the GAW re- series observed in the GAW spectrogram (Figure 4A) in the range
sulted in a typical subharmonic pattern, where the trajectory had of t = 0–560 ms was most likely the result of biphonation,51,52
to undergo two revolutions before repeating itself. The phasegram that is, of the left-right asymmetry of the vocal fold vibration.
derived from the GAW shows four almost-stable lines (reveal- A (single) fundamental frequency could only be determined in
ing the subharmonic pattern of the vibration). The small the range of t = 560–820 ms, with values in the range of about
fluctuations of the phasegram lines were caused by slight overall 220–250 Hz. The PE (Figure 4G) of both the hGAWL and the
amplitude drifts of the GAW. hGAWR decreased slightly when periodic vibration with a 1:1
ARTICLE IN PRESS
8 Journal of Voice, Vol. ■■, No. ■■, 2015

FIGURE 3. Glottovibrogram (GVG) and phasegram analysis of three stereotypical oscillatory regimes from in vivo phonation: periodic (healthy
female, left panels); subharmonic (female, vocal fold paralysis, middle panels); and irregular (female, vocal fold paralysis, right panels) – see
supplementary Movies M2, M3 and M4. (A) Acoustic signal. Because the mouth-to-microphone distance was not measured during the recordings,
the acoustic waveform was not time shifted in relation to the other time series signals to compensate for its delay; (B) glottal area waveform (GAW);
(C) GVG; (D) left: narrowband power spectrum of GAW (black) and acoustic waveform (gray), window duration = 50 ms; right: two-dimensional
phase portrait with Poincaré section; (E) phasegram of GAW.

left-right frequency ratio ensued, but it was less sensitive to the with the left vocal fold. In the AM(PE), these trends were also
change from heavily perturbed vibration to the sequence with found, but less pronounced.
3:4 frequency locking. These trends were also borne by the PCE
of the hGAWR (Figure 4H). The PCE of the hGAWL, on the other Statistical analysis
hand, showed a clear decrease from a maximum of about 0.6 The distributions of the PE and the PCE per diagnosis group are
at the beginning of the phonation, to a minimum of about 0.2 shown in Figure 5. The average (±standard deviation) GAW-
reached after 500 ms. Consequently, the AM(PCE) (Figure 4I) based phasegram entropy PE was 2.85 (±0.46), 3.02 (±0.52),
suggested differences between the regularity of the vibratory and 3.23 (±0.41) for the three diagnosis groups “healthy”, “func-
regimes of the left vs. the right vocal fold, starting around tional dysphonia”, and “paralysis”, respectively. The mean GAW-
t = 280 ms, with a more perturbed vibration of the right vocal based phasegram complexity estimate PCE for these groups was
fold (which in overall had the higher PCE values) as compared 0.21 (±0.11), 0.24 (±0.10), and 0.32 (±0.15), respectively.
ARTICLE IN PRESS
ChristianT. Herbst et al Phasegram analysis of laryngeal high-speed video recordings 9

FIGURE 4. Phasegram and PVG analysis of a nonstationary (glissando) phonation produced by a female with left vocal fold paralysis. (A) Spec-
trogram of GAW, window duration 128 ms, 40-dB dynamic range; (B) PVG of three extracted 50-ms segments starting at 200 ms, 480 ms, and
660 ms, respectively; (C) and (D) left and right hemi-GAW of three 50-ms segments extracted at 220 ms, 390 ms, and 750 ms, respectively; (E)
and (F) phasegram of left and right of hemi-GAW; (G) time-varying phasegram entropy (PE, see Methods) for left and right hemi-GAW; (H) time-
varying phasegram complexity estimate (PCE, see Methods) for left and right hemi-GAW; (I) PCE and PE asymmetry measures.
ARTICLE IN PRESS
10 Journal of Voice, Vol. ■■, No. ■■, 2015

FIGURE 5. The interquartile range distributions of PE and PCE (averaged per phonation) per diagnosis group. The whiskers indicate the fifth
and the 95th percentile, respectively, and the superimposed stars indicate the group means.

One-way repeated analysis of variance revealed significant DISCUSSION


effects of the diagnosis on the calculated entropy PE , In a recent publication, the phasegram has been introduced as
F(2, 70) = 4.03, P = 0.02. Pairwise post hoc t tests revealed sig- an intuitive visualization tool for various oscillatory phenome-
nificant differences between the healthy group and the paralysis na in physics and in biology, demonstrated with analysis of the
group for mean phasegram entropy PE (Bonferroni-Holm ad- human voice.33 Here, a more specialized investigation is per-
justed P = 0.02). A Kruskal-Wallis test of PCE per diagnosis formed, showing that phasegrams are useful in analyzing signals
group revealed significant effects (H(2) = 7.60, P = 0.02), and derived from HSV recordings documenting healthy and patho-
Bonferroni-adjusted Mann-Whitney post hoc U test revealed a logical phonations. The feasibility of the approach was
significant difference between the “healthy” group and the “pa- demonstrated by creating a phasegram of the GAW from a syn-
ralysis” group (P = 0.02). thesized vocal fold vibration movie based on the output
The boxplots of the AM per diagnosis group (Figure 6) reveal of a simple two-mass model.42 The model produced three
general trends for both AM ( PE ) and AM ( PCE ) to be higher stereotypic vibratory patterns (periodic, subharmonic, irregular—
in the pathological groups than in the healthy participants. These Supplementary Movie M1), which can be clearly identified in
trends are strongest in the “paralysis” group. The mean AM ( PE ) the respective phasegram (recall Figure 1G).
values were 0.028 (±0.014), 0.031 (±0.016), and 0.040 (±0.029)
for the three diagnosis groups “healthy”, “functional dyspho- Qualitative analysis
nia”, and “paralysis”. The mean AM ( PCE ) values for these The applicability of phasegram visualization to proper in vivo
groups were 0.071 (±0.042), 0.96 (±0.056), and 0.139 (±0.103). data derived from HSV recordings of vocal fold vibration was
No significant effects of the diagnosis on the AM ( PE ) were then documented in Figures 3 and 4. In Figure 3, which is com-
found (H(2) = 1.53, P = 0.47). Calculation of the AM ( PCE ), on posed of data from three attempts to produce stable phonation
the other hand, showed significant effects of diagnosis (ie, a stationary voice output), the causal relation between the
(H(2) = 7.18, P = 0.03). In particular, a post hoc test revealed vocal fold vibratory regime (periodic, subharmonic, and irreg-
significant differences between the “healthy” group and the “pa- ular) and the generated acoustical output is clearly seen. The
ralysis” group (Bonferroni-Holm adjusted P = 0.03). glissando case illustrated in Figure 4 documents the phasegram’s

FIGURE 6. The interquartile range distributions of asymmetry measures AM(PE) and AM(PCE) (averaged per phonation) per diagnosis group.
The whiskers indicate the fifth and the 95th percentile, respectively, and the superimposed stars indicate the group means.
ARTICLE IN PRESS
ChristianT. Herbst et al Phasegram analysis of laryngeal high-speed video recordings 11

potential to also visualize nonstationary signals. The pseudo- tory regime (as indicated in the Methods), potentially resulting
harmonic structure of the GAW signal (Figure 4A, t = 0– in a somewhat inflated measure in these cases. This notion is
560 ms) could have been mistaken for a periodic vibratory regime corroborated by the lack of significance when attempting to dis-
with an uncharacteristically low fundamental frequency, when criminate the diagnosis groups based on the PE-based asymmetry
assessed by a spectrogram alone. In contrast, the system dy- measure AM ( PE ), whereas the AM ( PCE ) revealed signifi-
namics approach of the phasegram gives additional insights (which cant differences between the “healthy” group and the “paralysis”
might be overlooked by just considering the spectrogram), by group. In this light, the PCE—and its related AM,
clearly earmarking the respective sequence as complex. By and AM ( PCE ) —can be considered to be the more robust of the
large, the quantitative analyses performed in Figures 3 and 4 thus two measures, and they might as such be preferred over the PE
suggest that the phasegram is a useful complement to existent in future studies.
analysis methods, and that it provides a reliable and an intu- Based on the periodicity assessment provided by the quan-
itive basis for periodicity assessment of vocal fold vibration. titative measures PE and PCE, asymmetry measures, automatically
derived from hemi-GAW phasegrams, were introduced. To the
Statistical analysis best of our knowledge, this is the first application of hGAW anal-
In this study, the phasegram approach has been augmented by ysis. As PE and PCE provide information of the regularity of
introducing two quantitative analysis parameters directly derived vibration, comparing the results of either of these measures for
from phasegram visualizations: the PCE and the PE. The sen- the left versus the right vocal fold results in an asymmetry co-
sitivity of these parameters to changes in vibratory regime was efficient as regards the regularity of vocal fold vibration. Whereas
demonstrated with synthesized (Figure 2) and in vivo GAW data in a normal healthy voice, both vocal folds are expected to vibrate
(Figure 4F and G). Statistical analysis of a corpus of HSV data regularly at identical fundamental frequencies, in more severe
from 73 adult females demonstrates that both the PE and the voice disorders, one vocal fold (or both) may vibrate irregu-
PCE are able to indicate trends of aberrations from the period- larly. AM(PE) and AM(PCE) were designed to be sensitive to
ic vocal fold vibration paradigm, as documented by HSV. different degrees of regularity of the respective vibratory regimes
Significantly greater mean PE and PCE values were found for within the left and the right vocal fold. AM(PE) and AM(PCE)
the participants with paralysis as compared with normophonic would assume higher values if one vocal fold would vibrate nearly
(healthy) participants. On the other hand, both the PE and the periodically, whereas the other would exhibit a more perturbed/
PCE measures could not significantly discriminate the “func- irregular vibratory pattern, as is not seldom the case in a vocal
tional dysphonia” group from either the healthy participants or fold paralysis.5,53 It is thus not surprising that the mean AM(PCE)
the participants with unilateral vocal fold paralysis. This can po- could detect significant differences between the “healthy” group
tentially be explained by the great variety of vocal fold vibratory and the “paralysis” group of participants, suggesting that the
patterns found within the “functional dysphonia” group, ranging AM(PCE) is a reliable indicator of vibratory regularity asym-
from perfectly nearly periodic to heavily disturbed. Overall, these metries. In contrast to the AM(PCE), the AM(PE) was not able
results suggest however that quantitative phasegram measures to perform at significance levels, which is attributed to the fact
are promising new tools to assess the quality of vocal fold vi- that PE values in the analyzed hGAWs were in overall greater
bration in a clinical context. As these measures can be calculated than the PE values of the GAWs extracted from the same re-
with unsupervised algorithms, they may be useful for the au- spective HSV recordings, caused by increased quantization noise
tomatic analysis of HSV data, potentially aiding in the detection due to limited spatial resolution. Nevertheless, owing to the dem-
of clinically interesting HSV sequences. onstrated discriminative power of the AM(PCE) based on hGAWs,
Phasegram generation relies on two-dimensional phase space that measure is a promising new indicator of vibratory asym-
embedding of the analyzed signal. As has been pointed out metries that may prove to be useful in future research and in
previously,33 phasegrams are therefore unable to distinguish clinical practice.
between low-dimensional (deterministic) chaos and high-
dimensional (stochastic) noise. An excessively low signal-to-
noise ratio in the analyzed time series might thus lead to an artificial CONCLUSION
inflation of the PCE and the PE measures. The fact that for the In this work, the phasegram visualization method has been ex-
“healthy” group both these measures (avg. PCE = 0.21 and avg. tended to the analysis of GAW data derived from HSV recordings
PE = 2.85) were in overall higher than the theoretically ex- of both normophonic and pathological voice production. Qual-
pected values (ie, zero for purely periodic vibration—compare itative analysis showed that the phasegram is a valuable
Figure 5 with Figure 2B and C) could in part be attributed to complement to existing analysis methods, as it provides direct
this notion. On the other hand, because vocal fold vibration is insights into the time-dependent complexity of vocal fold vi-
never a purely periodic but rather an inherently perturbed bration. Because of the phasegram’s potential to condense
phenomenon,1 slightly elevated mean PCE and PE values for the information about the vocal fold dynamics of an entire phona-
normophonic participants are likely to be expected. tion into a single graph, the approach is a time-effective alternative
Figure 2 suggests that PCE and PE are highly correlated, which to studying raw video data when looking for abnormal vibra-
is further illustrated in Supplementary Figure S1. However, the tory sequences in a clinical or in a research setting. Two newly
PE measure seems to be more sensitive to both quantization noise introduced quantitative analysis parameters, the PCE and the PE,
and gradual and abrupt changes of signal amplitude and vibra- were found to significantly distinguish between a group of healthy
ARTICLE IN PRESS
12 Journal of Voice, Vol. ■■, No. ■■, 2015

females and a group of females with vocal fold paralysis. Of where d[t] is an indicator of the intersection point set’s dimen-
these parameters, the PCE can be considered the more robust sion, which is expected to be in the range [0..1]. The dimension
one. When applying the PCE to the hGAWs of the left and the
d[t] is determined as the slope of a log-log plot of ε versus C(ε).
right vocal folds, respectively, a measure of the (a)symmetry of
This calculation is performed for every point in time t where a
vocal fold vibration regularity, the AM(PCE), was derived. This
phase space was extracted from the analyzed signal, and the re-
measure was also able to discriminate between the “healthy”
group and the “paralysis” group of participants. In conclusion, sulting quantity is denoted as PCE[t].
these findings indicate that the extended phasegram approach For the purpose of assessing the performance of the PCE, a
is a promising new tool for the automated quantitative analysis numerical test was performed involving the Cantor ternary set
and the classification of voice production biosignals, with po- (Cantor set hereafter—see, eg, Reference 55, p. 93). The Cantor
tential applications in research and in clinical practice. set is created by taking a line (or, more precisely, the interval
[0,1]), removing the middle third, then removing the middle third
of the remaining segments, continued ad infinitum.
Acknowledgments The Cantor set can be considered to be a special case of the
This research was supported by the institutional fund of Palacký phasegram histogram, with an infinite number of histogram bins.
University Olomouc, Czech Republic (to C.T.H.), by the Tech- The dimension of the Cantor set is defined as lg(2)/lg(3) ≈ 0.63.13
nology Agency of the Czech Republic project no. TA04010877 Because the algorithm for computing the PCE is inspired by the
(to CTH and JGS), by the state budget of the Czech Republic correlation dimension approach, the PCE of the Cantor set should
OPVK CZ.1.07/2.3.00/20.0057 (to J.G.Š.), and by grant no. be equal to its known dimension.
LO1413/2-2 from Deutsche Forschungsgemeinschaft (to J.U. and To test this assumption, a numerical Cantor set was created
J.L.). digitally. Naturally, when implementing a Cantor set on a com-
puter, the removal of the middle thirds of each segment cannot
be continued infinitely, but must be stopped after a certain number
APPENDIX 1 of iterations. For the purpose at hand, eight iterations were found
to be sufficient. After each iteration, up to 200 data points (rep-
The algorithm for computing the PCE (see also Herbst et al’s33 resenting intersection points between the phase space trajectory
supplementary materials) is inspired by the correlation dimen- and the Poincaré sections through that phase space made during
sion approach.16,54 During phasegram generation, Poincaré sections phasegram generation) were randomly distributed over the value
are generated through two-dimensional phase space, resulting range defined by the Cantor set, and the PCE was calculated.
in a set of n intersection points between the phase space trajec- As was expected, PCE converged toward ≈0.63 with an increas-
tory and the Poincaré section, where each point xi is characterized ing number of randomly distributed data points (Figure 7), thus
by its offset from the phase space origin. For each point xi within establishing that PCE is a good proxy of the complexity of the
each such set, the number of neighboring points N(ε,i) within underlying phase space trajectory in phasegram generation.
a given radius ε is calculated as
n
N (ε , i ) = ∑ H (ε − xi − x j ) (A1) APPENDIX 2
j =1, j ≠i

where H is the Heaviside step function, defined as H(χ) = 1 if The dependence on the number of data points in the test sce-
χ ≥ 0, and H(χ) = 0 if χ < 0. In analogy to correlation dimen- nario in Figure 7 corroborates observations from early phasegram
analysis trial runs, which suggested that the PCE is dependent
sion computation, the average number of pairs xi, xj that has a
on both the number of sample points of the embedded signal
distance of xi − x j < ε is found for each intersection point set.
(Figure 1C) and the (residual) fundamental frequency of the ana-
As dual comparisons of each point pair are avoided to speed up lyzed nearly periodic and irregular signals. This dependence was
computation time, the averaged count C(ε) is expressed as comprehensively quantified by analyzing synthesized signals (with
known vibratory regimes) with a number of varying analysis
2 n n
C (ε ) = ∑ ∑ H (ε − xi − x j )
n (n) i=1 j=i+1
(A2) parameters.
The signals for the test cases were generated with the logis-
Because of the nature of the analyzed GAW signal, the tem- tic map equation
poral delay between individual data points within the intersection x [ i + 1] = ax [ i ](1 − x [ i ]) (A4)
point set equals the duration of the respective glottal cycle. There-
fore, Theiler’s correction54 was not deemed necessary. Three types of signals were generated, each with a stable pa-
During phasegram generation, the distance ε is typically varied rameter a: a = 3.2, resulting in a periodic signal; a = 3.5, resulting
in 40 equally spaced steps from 100 1
to half the maximum value in a subharmonic (period doubling) sequence; and a = 3.8, pro-
of the intersection point set. As ε increases, the correlation in- ducing an irregular signal that closely resembles deterministic
tegral C(ε) is expected to grow by a power law chaos, because it contains some residual periodic energy (see
Reference 33 figure 1c, left panel, for an example), a phenom-
C[t ](ε ) ∝ ε d[t ] (A3) enon which is found even in the most disturbed voice
ARTICLE IN PRESS
ChristianT. Herbst et al Phasegram analysis of laryngeal high-speed video recordings 13

FIGURE 7. Phasegram complexity estimate (PCE) for vectors consisting of n data points (n = [10..200]) randomly distributed over the intervals
of Cantor ternary sets generated by m iterations (m = [1..8]). PCE converges toward ln(2)/ln(3) ≈ 0.63 with increasing n.

signals.14 These three signal stereotypes were instantiated at a


number of different sampling frequencies (200, 400, 800, 1600,
and 3200 Hz), resulting in (residual) fundamental frequencies
of half the sampling frequency, respectively, thus simulating (re-
sidual) vocal fundamental frequencies in the range of 100–
1600 Hz. The resulting signals were upsampled to 22,050 Hz
using the Praat sinc interpolation algorithm,56 and then stored
as 16-bit WAV files.
The generated WAV files were subjected to phasegram anal-
ysis, using various values for the phasegram window duration
(0.005, 0.01, 0.02, 0.04, 0.05, 0.06, 0.08, 0.1, 0.12, 0.14, and
0.16 s) and the number of histogram bins (25, 50, 100, 200, and
400). Each individual analysis was performed three times while
scaling the amplitude of the analyzed signal by a factor of 1 (un-
changed), 0.5, and 0.25, respectively, to assess the effect of signal
amplitude.
In this manner, a total of 2475 PCEs were calculated. Anal-
ysis of these data suggested that the complexity estimate was
mainly dependent on the simulated (residual) fundamental fre-
quency and the phasegram window duration. By combining these
two parameters, the number of (residual) cycles per analysis
FIGURE 8. Phasegram complexity estimate as a function of the window was estimated. The dependence of the complexity pa-
number of (residual) cycles per analysis window for all 2475 simu- rameter on the number of (residual) cycles per analysis window
lated cases (x-axis limited to 70 cycles per window). In the case of for the three signal stereotypes (periodic, period doubling, and
deterministic chaos (red rectangles), the phasegram analysis window chaos) is shown in Figure 8. The analysis results suggest that
must be large enough to comprise about 20 cycles of the analyzed signal, in the irregular case, the phasegram window duration parame-
to guarantee that the complexity estimate converges. (For interpreta- ter must be large enough to comprise 20 cycles of the analyzed
tion of the references to color in this figure legend, the reader is referred signal, to avoid underestimation of the respective Poincaré section
to the Web version of this article.) complexity. The convergence of the three data sets (periodic,
ARTICLE IN PRESS
14 Journal of Voice, Vol. ■■, No. ■■, 2015

subharmonic, and irregular) to about 0, 0.3, and 0.65 respec- 23. Baken RJ. Irregularity of vocal period and amplitude: a first approach to
tively, shows that the reliability of the PCE is mainly affected the fractal analysis of voice. J Voice. 1990;4:185–197.
24. Behrman A, Baken R. Correlation dimension of electroglottographic
by the number of cycles per analysis window. Results form ad- data from healthy and pathologic subjects. J Acoust Soc Am. 1997;102:2371–
ditional tests suggest that other factors, such as the number of 2379.
histogram bins or the amplitude of the signal, do not substan- 25. Behrman A. Global and local dimensions of vocal dynamics. J Acoust Soc
tially influence the results. Am 1999;105:432–443.
26. Mergell P, Herzel H, Titze IR. Irregular vocal-fold vibration—high-speed
SUPPLEMENTARY DATA observation and modeling. J Acoust Soc Am 2000;108:2996–3002.
27. Zhang Y, Krausert CR, Kelly MP, et al. Typing vocal fold vibratory patterns
in excised larynx experiments via digital kymography. Ann Otol Rhinol
Supplementary data related to this article can be found online
Laryngol 2009;118:598–605.
at doi:10.1016/j.jvoice.2015.11.006. 28. Zhang Y, Jiang JJ. Asymmetric spatiotemporal chaos induced by a polypoid
mass in the excised larynx. Chaos. 2008;18.
REFERENCES 29. Zhang Y, Jiang JJ, Tao C, et al. Quantifying the complexity of excised larynx
1. Titze IR. Workshop on acoustic voice analysis. Summary statement: National vibrations from high-speed imaging using spatiotemporal and nonlinear
Center for Voice and Speech; 1995. dynamic analyses. Chaos. 2007;17.
2. Baken RJ, Orlikoff RF. Clinical Measurement of Speech and Voice. 30. Herzel H, Berry D, Titze I, et al. Nonlinear dynamics of the voice: signal
2nd ed. San Diego, CA: Singular Publishing, Thompson Learning; analysis and biomechanical modeling. Chaos. 1995;5:30–34.
2000. 31. Svec JG, Schutte HK, Miller DG. On pitch jumps between chest and falsetto
3. Bohr C, Krack A, Dubrovskiy D, et al. Spatiotemporal analysis of high-speed registers in voice: data from living and excised human larynges. J Acoust
videolaryngoscopic imaging of organic pathologies in males. J Speech Lang Soc Am 1999;106(3 I):1523–1531.
Hear Res. 2014;57:1148–1161. 32. Tokuda IT, Horacek J, Svec JG, et al. Bifurcations and chaos in register
4. Mehta DD, Deliyski DD, Zeitels SM, et al. Voice production mechanisms transitions of excised larynx experiments. Chaos. 2008;18.
following phonosurgical treatment of early glottic cancer. Ann Otol Rhinol 33. Herbst CT, Herzel H, Svec JG, et al. Visualization of system dynamics using
Laryngol. 2010;119:1–9. phasegrams. J R Soc Interface. 2013;10:1–14.
5. Svec JG, Sram F, Schutte HK. Videokymography in voice disorders: what 34. Packard NH, Crutchfield JP, Farmer JD, et al. Geometry from a time series.
to look for? Ann Otol Rhinol Laryngol 2007;116:172–180. Phys Rev Lett. 1980;45:712–716.
6. Berry DA, Herzel H, Titze IR, et al. Interpretation of biomechanical 35. Roux J-C, Simonyi RH, Swinney HL. Observation of a strange attractor.
simulations of normal and chaotic vocal fold oscillations with empirical Physica D. 1983;8:257–266.
eigenfunctions. J Acoust Soc Am. 1994;95:3595–3604. 36. Herbst CT. Glottal efficiency of periodic and irregular in vitro red deer voice
7. Hollien H, Michel J, Doherty ET. A method for analyzing vocal jitter in production. Acta Acoust United Acoust. 2014;100:724–733.
sustained phonation. J Phon 1973;1:85–91. 37. Lohscheller J, Eysholdt U. Phonovibrogram visualization of entire vocal
8. Yumoto E, Gould WJ, Baer T. Harmonics-to-noise ratio as an index of the fold dynamics. Laryngoscope. 2008;118:753–758.
degree of hoarseness. J Acoust Soc Am 1982;71:1544–1549. 38. Lohscheller J, Eysholdt U, Toy H, et al. Phonovibrography: mapping
9. Lauterborn W, Parlitz U. Methods of chaos physics and their application high-speed movies of vocal fold vibrations into 2-D diagrams for visualizing
to acoustics. J Acoust Soc Am. 1988;84:1975–1993. and analyzing the underlying laryngeal dynamics. IEEE Trans Med Imaging.
10. Herzel H. Bifurcations and chaos in voice signals. Appl Mech Rev. 2008;27:300–309.
1993;46:399–413. 39. Lohscheller J, Toy H, Rosanowski F, et al. Clinically evaluated procedure
11. Titze IR, Baken RJ, Herzel H. Evidence of chaos in vocal fold vibration. for the reconstruction of vocal fold vibrations from endoscopic digital
In: Titze IR, ed. Vocal Fold Physiology: Frontiers in Basic Science. San high-speed videos. Med Image Anal. 2007;11:400–413.
Diego, CA: Singular Publishing Group; 1993:143–188. 40. Unger J, Lohscheller J, Reiter M, et al. A noninvasive procedure for
12. Jiang J, Zhang Y, McGilligan C. Chaos in voice, from modeling to early-stage discrimination of malignant and precancerous vocal fold lesions
measurement. J Voice. 2006;20:2–17. based on laryngeal dynamics analysis. Cancer Res. 2015;75:31–39.
13. Strogatz SH. Nonlinear Dynamics and Chaos: With Applications to Physics, 41. Karakozoglou S-Z, Henrich N, d’Alessandro C, et al. Automatic glottal
Biology, Chemistry, and Engineering. First Indian Edition. Kolkata, India: segmentation using local-based active contours and application to
Levant Books; 2007. glottovibrography. Speech Commun 2012;54:641–654.
14. Fitch WT, Neubauer J, Herzel H. Calls out of chaos: the adaptive significance 42. Steinecke I, Herzel H. Bifurcations in an asymmetric vocal fold model.
of nonlinear phenomena in mammalian vocal production. Anim Behav. J Acoust Soc Am 1995;97:1874–1884.
2002;63:407–418. 43. Smith JO. Mathematics of the Discrete Fourier Transform (DFT), with Audio
15. Lauterborn W, Cramer E. Subharmonic routes to chaos observed in acoustics. Applications, 2nd ed, http://ccrma.stanford.edu/~jos/mdft/, online book, 2007,
Phys Rev Lett. 1981;47:1445–1448. accessed 2015-12-30.
16. Grassberger P, Procaccia I. Measuring the strangeness of strange attractors. 44. Bergé P, Pomeau Y, Vidal C. Order Within Chaos: Towards a Deterministic
Physica D. 1983;9:189–208. Approach to Turbulence. Paris: Hermann and John Wiley & Sons; 1984.
17. Eckmann J, Kamphorst SO, Ruelle D, et al. Liapunov exponents from time 45. Shannon CE. A mathematical theory of communication. Bell Syst Tech J
series. Phys Rev A. 1986;34:4971–4979. 1948;27:379–423, 623–56.
18. Tokuda I, Riede T, Neubauer J, et al. Nonlinear analysis of irregular animal 46. Herzel H, Große I. Correlations in DNA sequences: the role of protein coding
vocalizations. J Acoust Soc Am. 2002;111:2908–2919. segments. Phys Rev E. 1997;55:800–810.
19. Tokuda I, Miyano T, Aihara K. Surrogate analysis for detecting nonlinear 47. Abdi H. Holm’s sequential Bonferroni procedure. In: Salkind NJ, ed.
dynamics in normal vowels. J Acoust Soc Am. 2001;110:3207–3217. Encyclopedia of Research Design. Thousand Oaks, CA: SAGE Publications,
20. Herzel H, Holzfuss J, Kowalik Z, et al. Detecting bifurcations in voice Inc.; 2010:574–578.
signals. In: Kantz H, Kurths J, Mayer-Kress G, eds. Nonlinear Analysis of 48. R Development Core Team, ed. R: A Language and Environment for
Physiological Data. Berlin: Springer Verlag; 1998:325–344. Statistical Computing. Vienna, Austria: R Foundation for Statistical
21. Zhang Y, Jiang J. Acoustic analyses of sustained and running voices from Computing; 2011.
patients with laryngeal pathologies. J Voice. 2008;22:1–9. 49. Herzel H, Berry D, Titze IR, et al. Analysis of vocal disorders with methods
22. Zhang Y, Jiang J, Biazzo L, et al. Perturbation and nonlinear dynamic from nonlinear dynamics. J Speech Hear Res. 1994;37:1008–1019.
analyses of voices from patients with unilateral laryngeal paralysis. J Voice. 50. Tigges M, Mergell P, Herzel H, et al. Observation and modelling of glottal
2005;19:519–528. biphonation. Acustica Acta Acustica. 1997;83:707–714.
ARTICLE IN PRESS
ChristianT. Herbst et al Phasegram analysis of laryngeal high-speed video recordings 15

51. Neubauer J, Mergell P, Eysholdt U, et al. Spatio-temporal analysis of irregular 54. Theiler J. Spurious dimension from correlation algorithms applied to limited
vocal fold oscillations: biphonation due to desynchronization of spatial time-series data. Phys Rev A. 1986;34:2427–2432.
modes. J Acoust Soc Am. 2001;110:3179–3192. 55. Gleick J. Chaos. The Amazing Science of the Unpredictable. London:
52. Herzel H, Reuter R. Biphonation in voice signals. Natl Cent Voice Speech Vintage; 1987.
Status Prog Rep. 1996;9:109–115. 56. Boersma P, Weenink D. Praat: Doing Phonetics by Computer. Amsterdam,
53. Svec J, Sram F. Videokymographic examination of voice. In: Ma E, Yu E, The Netherlands: Institute of Phonetic Sciences, University of Amsterdam;
eds. Handbook of Voice Assessments. San Diego, CA: Plural Publishing; 2014.
2011:129–146.