0 valutazioniIl 0% ha trovato utile questo documento (0 voti)

3 visualizzazioni15 pagineArticle on laryngeal analysis

Sep 05, 2018

© © All Rights Reserved

PDF, TXT o leggi online da Scribd

Article on laryngeal analysis

© All Rights Reserved

0 valutazioniIl 0% ha trovato utile questo documento (0 voti)

3 visualizzazioni15 pagineArticle on laryngeal analysis

© All Rights Reserved

Sei sulla pagina 1di 15

Documented With Laryngeal High-speed

Video Endoscopy

*Christian T. Herbst, †Jakob Unger, ‡Hanspeter Herzel, *Jan G. Švec, and §Jörg Lohscheller, *Olomouc, Czech

Republic; †Aachen, Germany; ‡Berlin, Germany; and §Trier, Germany

Summary: Introduction. In a recent publication, the phasegram, a bifurcation diagram over time, has been intro-

duced as an intuitive visualization tool for assessing the vibratory states of oscillating systems. Here, this nonlinear

dynamics approach is augmented with quantitative analysis parameters, and it is applied to clinical laryngeal high-

speed video (HSV) endoscopic recordings of healthy and pathological phonations.

Methods. HSV data from a total of 73 females diagnosed as healthy (n = 42), or with functional dysphonia (n = 15)

or with unilateral vocal fold paralysis (n = 16), were quantitatively analyzed. Glottal area waveforms (GAW) and left

and right hemi-GAWs (hGAW) were extracted from the HSV recordings. Based on Poincaré sections through phase

space-embedded signals, two novel quantitative parameters were computed: the phasegram entropy (PE) and the phasegram

complexity estimate (PCE), inspired by signal entropy and correlation dimension computation, respectively.

Results. Both PE and PCE assumed higher average values (suggesting more irregular vibrations) for the pathologi-

cal as compared with the healthy participants, thus significantly discriminating healthy group from the paralysis group

(P = 0.02 for both PE and PCE). Comparisons of individual PE or PCE data for the left and the right hGAW within

each subject resulted in asymmetry measures for the regularity of vocal fold vibration. The PCE-based asymmetry measure

revealed significant differences between the healthy group and the paralysis group (P = 0.03).

Conclusions. Quantitative phasegram analysis of GAW and hGAW data is a promising tool for the automated pro-

cessing of HSV data in research and in clinical practice.

Keywords: phasegram–nonlinear analysis–periodicity–high-speed video endoscopy–glottal area waveform.

The behavior of a vibratory system is periodic if the observed the correlation dimension,16 Lyapunov exponents,17 or Tokuda

oscillatory pattern continuously repeats itself after a constant time et al’s low-dimensional nonlinearity measure.18 These methods

interval. Periodicity abiding this strict definition is hardly ob- have been successfully applied during the analysis of biosignals

served in empirical data of biomechanical systems such as the from both healthy and pathological voices, such as the acous-

voice. Rather, voice production is at best a nearly periodic1 phe- tical waveform,12,19–22 electroglottography,23–25 or data derived from

nomenon under nonpathological conditions. In the presence of high-speed video (HSV) recordings of vocal fold vibration.26–29

a voice disorder, vocal fold vibration and thus the generated acous- The detailed interpretation of available quantitative methods

tical output is likely to be more or less perturbed,2 often caused for analyzing the dynamics of irregular voice often requires expert

by highly irregular vibratory regimes of the vocal folds.3–6 background knowledge in mathematics and physics. In con-

Deviations of periodicity can be quantified in a variety of ways. trast, visualization methods are often easier to understand for

Apart from time domain-based and frequency domain-based ap- nonexperts. Such visualization methods, applied to nonperiodic

proaches such as calculation of jitter7 or the harmonics-to- voice production, include for example spectrograms12,30,31 or local

noise ratio,8 methods from nonlinear systems analysis have maxima displays.32

received growing interest in the past decades.9–12 In non-linear Recently, a novel visualization method of system dynamics

dynamics methods, the voice is considered to be a dynamical has been introduced: the phasegram.33 In a phasegram, time is

system13 that is able to exhibit a wide variety of oscillatory be- mapped onto the x-axis, and various vibratory regimes, such as

havior “on the way to chaos.”14,15 periodic oscillation, subharmonics, or chaos, are identified within

Several quantitative methods for assessing the complexity of the generated graph by the number and the stability of horizon-

the temporal behavior of nonlinear systems have been intro- tal lines. Phasegrams can be interpreted as bifurcation diagrams

in time. They are particularly suited for nonstationary signals.

Accepted for publication November 12, 2015.

The benefits of sliding window analysis are combined with the

From the *Voice Research Laboratory, Department of Biophysics, Faculty of Science, visualization potential of phase space embedding.34,35 In con-

Palacký University Olomouc, Tr. 17. listopadu 12, 771 46 Olomouc, Czech Republic; †In-

stitute of Imaging & Computer Vision, RWTH Aachen University, Kopernikusstr. 16, 52074

trast to other nonlinear analysis techniques (eg, bifurcation maps),

Aachen, Germany; ‡Institute for Theoretical Biology, Humboldt University Berlin, phasegrams can be automatically constructed from a time domain

Invalidenstraße 43, 10115 Berlin, Germany; and the §Department of Computer Science,

University of Applied Sciences, Schneidershof, 54293 Trier, Germany.

signal alone, no additional system parameter needs to be known.

Address correspondence and reprint requests to Christian T. Herbst, Voice Research In contrast to conventional voice perturbation measures (eg, jitter),

Laboratory, Department of Biophysics, Faculty of Science, Palacký University Olomouc,

Tr. 17. listopadu 12, 771 46 Olomouc, Czech Republic. E-mail: herbst@ccrma.stanford.edu

no information about glottal cycle duration or fundamental fre-

Journal of Voice, Vol. ■■, No. ■■, pp. ■■-■■ quency needs to be known.

0892-1997

© 2015 The Voice Foundation. Published by Elsevier Inc. All rights reserved.

Phasegrams have thus far been utilized for the visualization33

http://dx.doi.org/10.1016/j.jvoice.2015.11.006 and the manual classification36 of electroglottographic voice

ARTICLE IN PRESS

2 Journal of Voice, Vol. ■■, No. ■■, 2015

signals. Here, their application to the analysis of time series data recordings, the time-varying glottal edges enveloping the glottal

derived from HSV recordings is introduced by example of simu- area were segmented within each video frame, thus resulting in

lated vocal fold vibrations using a lumped element biomechanical the glottal area waveform (GAW) in pixels. This segmentation

model. The concept is further extended to healthy and patho- process also provides information on the deflection of the medial

logical phonations, considering both stationary and nonstationary edges of both vocal folds, describing their mediolateral vibra-

signals. The analysis is complemented by spatiotemporal tion patterns. The segmentation information can also be utilized

visualization37,38 and of Fourier analysis of vocal fold vibration to compute the left and the right hemi-GAW (hGAWL and

and of simultaneously acquired acoustical signals. It will be shown hGAW R , respectively). These quantities

that sequences of aberrant vocal fold vibratory behavior can be are defined as the time-varying area (expressed in pixels)

easily located in phasegrams, thus earmarking the method as a of the left and the right half of the glottis, respectively, as seg-

promising candidate for detection of clinically relevant pas- mented along the anteroposterior midline, 40 satisfying

sages within HSV recordings. To facilitate automated objective hGAWL + hGAWR = GAW. In this study, both the GAW and the

analysis of vocal fold vibratory behavior (as seen in HSV re- hGAWL,R are being analyzed.

cordings), two novel quantitative analysis parameters derived from For visual inspection, information on the extracted time-

the phasegram visualization are introduced in this paper. The per- varying glottal edges was used to create phonovibrograms (PVG)38

formance of these quantitative parameters is assessed through and glottovibrograms (GVG),29,41 two visualization techniques

analysis of a database containing HSV recordings of healthy and that transfer information on the time-varying lateral deflection

pathological phonations. of the vocal folds (as color information) along the anterior-

posterior (A-P) dimension into a single graph. These two

METHODS visualization techniques were utilized in Figure 3 (GVG) and

in Figure 4 (PVG) in this text.

Participants and phonatory tasks

A total of 73 female participants were included in the study.

Before data acquisition, all participants underwent a standard Vocal fold vibration simulation

clinical evaluation. Forty-two of these were considered to be As a proof of concept for the later described phasegram anal-

normophonic (ie, healthy) speakers. Another 15 participants were ysis of clinical HSV data from human phonation, a synthesized

diagnosed with functional dysphonia, and the remaining 16 were HSV containing three stereotypical modes of vibration (peri-

diagnosed with unilateral vocal fold paralysis. The average (±stan- odic, period doubling, and chaos) was generated using a simplified

dard deviation) age of these clinical groups was 40.2 ± 15.8 years two-mass model simulation approach.42 This model was in-

(healthy), 46.2 ± 16.1 years (functional dysphonia), and cluded in this paper for didactic purposes, to illustrate the

53.7 ± 23.2 years (unilateral vocal fold paralysis), respectively. interrelation between biomechanical vibratory regimes of the vocal

All participants were asked to steadily phonate the sustained vowel folds and their appearance in the phasegram, and to verify the

/ae/ at habitual pitch and loudness during endoscopy. These pho- systematic variations of the new measures introduced in this work

nations constituted the corpus of stationary data that was utilized in a controlled environment with known input data. The model

for statistical analysis in this study. One glissando produced by was configured with the parameters specified in Table 1. The

a female with left vocal fold paralysis is also discussed in this simulation was run for 5 seconds at a sampling frequency of

paper, illustrating the phasegram’s potential to visualize 4000 Hz (corresponding to the video frame rate of the HSV re-

nonstationary phonation (Figure 4). cordings analyzed in this study), driven by subglottal pressure

that gradually varied from 0 to 2 kPa and back, reaching a

maximum plateau at t = 2 s to t = 3 s. For both vocal folds, the

Data acquisition

HSV data of vocal fold vibration during phonation was ac-

quired using an HS Endocam 5562 high-speed camera system TABLE 1.

(Richard Wolf GmbH, Knittlingen, Germany) operated at 4000 Two-mass Model42 Parameters Used for Generating a

frames per second with a spatial resolution of 256 × 256 pixels. Vocal Fold Vibration Sequence With Three Stereotypi-

The system was equipped with a 9-mm rigid endoscope with cal Modes of Vibration (Periodic, Period Doubling, Chaos)

90° optics (Richard Wolf GmbH, Knittlingen, Germany), and Parameter Description Value

a Wolf 5123 Auto LP 250 Watt Xenon-light lamp (Richard Wolf

q Asymmetry parameter 0.644

GmbH, Knittlingen, Germany) served as the light source. Each

l Vocal fold length [cm] 1.739

participant’s phonation was recorded for 250 ms, resulting in 1000

m1 Lower mass [g] 1.032

video frames per participant, and a total of 83,000 analyzed video m2 Upper mass [g] 0.353

frames. k1 Spring stiffness of lower mass 0.161

[cm/ms2]

HSV analysis k2 Spring stiffness of lower mass 0.031

To enable quantitative analysis of the vibrating patterns along [cm/ms2]

the entire length of the vocal folds, a clinically evaluated image- kc Coupling stiffness 0.152

d1 Thickness of lower mass [cm] 1.687

processing procedure was utilized, which is described in detail

d2 Thickness of upper mass [cm] 1.636

in Reference 39. When applying this method to all acquired HSV

ARTICLE IN PRESS

ChristianT. Herbst et al Phasegram analysis of laryngeal high-speed video recordings 3

time-varying displacement of either the upper or the lower mass typically indicate the system state at a particular point in

(always considering the smaller of the two respective mass de- time:

flections, to arrive at the projected glottal area) was interpreted

as the simulated maximum oscillatory deflection A[i] of a simple • one line: no oscillation (stasis)

string model of vocal fold vibration with one vibratory mode • two lines: periodic oscillation (Figure 1G, t = 0.5 s)

(Supplementary Movie M1). • more than two stable lines: subharmonic vibration

(Figure 1G, t = 2 s)

Phasegram generation • absence of continuous lines, rugged appearance: irregu-

The phasegram generation process is described in detail in a pre- lar oscillatory patterns (Figure 1G, t = 1.5 s)

vious publication. 33 Here, an outline of the procedure is

recapitulated for the analysis of HSV data. Initially, the GAW Note that these results can be expected only if the analyzed

signals extracted from all analyzed HSV recordings were con- time series has exactly two zero-crossings per cycle (one pos-

verted into complex analytic signals by means of a Hilbert itive and one negative), as is for example the case in

transform.43 For creating phasegrams, consecutive portions of electroglottographic signals or in the GAW data analyzed in

each analytic signal were embedded in two-dimensional phase this study. If, however, the waveform contains more than two

space,34,35 resulting in a number of temporally consecutive phase zero-crossings per cycle (eg, typically seen in the acoustic

portraits equally distributed over the duration of each analyzed signal) due to the presence of noteworthy harmonics, the un-

signal (See Figure 1D for three examples extracted at t = 0.5 s, derlying Poincaré section will result in proportionately more

t = 1.5 s, and t = 2 s). Each two-dimensional phase portrait was intersection points with the phase space trajectory. In such a

intersected by two so-called Poincaré sections (cf., Reference case, more than the above-mentioned stable lines will be seen

44 figure IV.1) starting at the phase space origin and going ra- for periodic and subharmonic oscillatory regimes, and thus the

dially outward at opposite angles π radians (or 180°) apart—see derived quantitative measures phasegram entropy (PE) and

equation S1 in supplementary materials to Herbst et al.33 Two phasegram complexity estimate (PCE) (below) will be slightly

connected Poincaré sections thus form a straight line through inflated.

the origin of the phase space, resulting in a certain number of All phasegrams utilized for statistical evaluation were calcu-

intersection points between the embedded phase space trajec- lated with a window duration of 100 ms, a read progress of 5 ms

tory and the intersection line. (allowing for a considerable overlap of consecutive windows,

The angle of the Poincaré sections through the correspond- thus increasing temporal accuracy), 100 angular bins (parame-

ing phase portraits crucially influences the final appearance of ter M in Herbst et al33), and 128 histogram bins for assessing

the generated phasegram. The optimal Poincaré section angle, intersections between phase space trajectories and Poincaré sec-

ideally capturing the full complexity of the underlying vibra- tions. All data analysis algorithms used in this study were

tory phenomenon, is determined by an automated algorithm that implemented by author C.T.H. in the programming language

is described in detail in the supplementary materials of Herbst Python. The source code for creating phasegrams is available

et al33 and in Appendix 1. This algorithm divides all two- online (www.phasegram.org).

dimensional phase spaces derived from each analyzed signal

(recall Figure 1D) into M equally spaced angular sections, and Quantitative analysis

computes the complexity of all the M resulting one-dimensional Phasegram Entropy (PE)

Poincaré section vectors with an approach inspired by Grassberger As illustrated in Figure 1, the phasegram is composed of a series

and Procaccia.16 For each of these vectors, the complexity es- of intensity-coded histograms of Poincaré sections through con-

timate is averaged over all extracted phase portraits, and the secutive two-dimensional phase space embeddings of portions

Poincaré section angle that results in the highest averaged com- of the analyzed signal. As the underlying signal gets more

plexity estimate is chosen for the generation of the phasegram. complex, the intersections between the phase space trajectory

Figure 1D shows three horizontally aligned intersection lines and the respective Poincaré section are distributed over a greater

within the two-dimensional phase space. The occurrence of in- number of histogram bins (see Figure 1E for stereotypical cases

tersection points along these lines was assessed by means of of periodic, subharmonic, and irregular vibration). Based on Shan-

histograms with a given number of histogram bins (Figure 1E). non’s work,45 the complexity of the analyzed signal can thus be

Each histogram was grayscale coded and thus converted into a assessed by calculating the entropy of the Poincaré section his-

so-called “trajectory strip” (Figure 1F). The resulting trajecto- tograms. The entropy H, measured in bits, is a measure for the

ry strips were then in turn rotated by 90 degrees and consecutively information content associated with the outcome of a random

plotted onto a graph at equally spaced time intervals to form the variable. It is calculated as

phasegram (Figure 1G). As a by-product of phasegram gener-

ation, so-called “phase portrait movies” were created, which show n

the evolution of the phase space trajectory over time (Supple- H = −∑ pi log2 ( pi ) (1)

i=1

mentary Movie M1).

In the phasegram, time is mapped onto the x-axis. Based where pi is the probability of the occurrence (between 0 and 1)

on the underlying attractor in two-dimensional phase space, of a bin in the histogram. In the case of phasegrams, pi is the

the number and the stability of lines perpendicular to the y-axis number of trajectory intersections with the Poincaré section found

ARTICLE IN PRESS

4 Journal of Voice, Vol. ■■, No. ■■, 2015

FIGURE 1. Illustration of phasegram generation, exemplified by analysis of vocal fold vibration simulated with a simplified two-mass model

(nonvibrating intervals not shown). (A) Narrowband spectrogram (FFT window duration 185.8 ms) of a GAW signal generated with the Steinecke-

Herzel two-mass model driven in a pressure sweep. (B) GAW signal derived from the model. (C) Three portions of the GAW signal, extracted at

temporal offsets t = 0.5 s (periodic), t = 1.5 s (irregular), and t = 2 s (subharmonic). (D) Two-dimensional phase space embedding of the above signals,

created by Hilbert transformation and attractor reconstruction. An intersection line (Poincaré section) was created along the x-axis (orange/light

gray), yielding intersection points with the trajectory (red/gray dots). (E) Histograms of trajectory intersection points with intersection lines

for all three extracted signal portions. (F) “Trajectory strips”: color-coded histograms of intersection lines through phase portraits; (G) phasegram

from the signal displayed in (A) and (B). The markers at t = 0.5 s, t = 1.5 s, and t = 2 s represent the temporal offset of the three trajectory

strips from (F) within the graph. (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this

article.)

ARTICLE IN PRESS

ChristianT. Herbst et al Phasegram analysis of laryngeal high-speed video recordings 5

in a given histogram bin (recall Figure 1E), divided by the total Phasegram Complexity Estimate (PCE)

number of intersections (found in all bins) of the analyzed As indicated earlier, phasegrams are generated by creating a (one-

histogram. dimensional) Poincaré section through two-dimensional phase

As indicated above, for phasegram generation, the definition space. The optimal angle of that Poincaré section is deter-

of a histogram follows the convention that a Poincaré section mined by a complexity estimate described in Appendix 1 and

of an oscillating phase space trajectory is typically made through in the supplementary materials of Herbst et al.33 This complex-

its positive semi-orbit. Consequently, histograms of Poincaré sec- ity measure, hereafter termed PCE, is a dimensionless quantity

tions are taken only from the phase space origin outward at a that is theoretically expected to be in the range of zero to one.

certain angle, and the resulting phasegram representation is con- As a proof of concept, the PCE for a signal synthesized from

stituted by two Poincaré section histograms taken at opposite the logistic map equation, having well-known oscillatory prop-

angles. For entropy calculation, the average entropy from both erties, is shown in Figure 2D. For a nonperturbed, purely periodic

these respective histograms is calculated, and the resulting quan- oscillatory regime, the PCE is zero. Subharmonic vibratory

tity is introduced as the PE, measured in bits, which is determined regimes result in values of about 0.3 or greater (Appendix 1).

for every point in time t where a phase space was extracted from As suggested in Appendix 2, a signal subjected to PCE cal-

the analyzed signal: culation should ideally contain 20 vibratory cycles or more, for

the full complexity of the underlying phase space trajectory to

HΘ + HΘ+π emerge. If, on the other hand, the analysis window (and thus the

PE[t ] = , (2)

2 duration of the segment extracted from the analyzed time series

where Hθ and Hθ+π are the entropies of the two Poincaré section based on which the consecutive phase space embeddings are con-

structed) is too long, abrupt transitions, bifurcations, or other

histograms, taken at angles of θ and θ + π radians, respective-

perturbations of the input signal would be “smeared out” over

ly, within the phase space at time index t. The time index t is

time, decreasing the temporal accuracy of the analysis (just as

the progressive count of phase space embeddings within the ana- in the spectrogram), and potentially inflating the respective PCE

lyzed signal. readings. As a compromise, the window duration in this anal-

The PE for a synthesized signal with known oscillatory regimes ysis was chosen to be 100 ms. Such a window duration would

is illustrated in Figure 2C. The PE for perfectly sinusoidal vi- encompass 20 glottal cycles of phonation at the average speak-

bration is zero, because only one intersection per histogram is ing fundamental frequency of females, that is, 200 Hz.2 It would

found (Figure 2C, t ≈ 0–3 s). Note though that even in this per- furthermore result in reasonably reliable PCE values even for

fectly periodic segment PE[t] was non-zero at a few instances, potentially pathological female phonations at fundamental fre-

caused by the gradually changing amplitude of the synthesized quencies as low as 100 Hz, resulting in 10 cycles per analysis

signal, distributing the respective trajectory intersections over window (Figure 8 suggests that a window duration that encom-

more than one histogram bin. For a period 2 subharmonic os- passes 10 cycles of phonation would still result in a PCE of about

cillation (period doubling), PE = 1 bit (t ≈ 3–6.9 s), and for a 0.6 at a completely irregular oscillatory regime).

period 4 subharmonic oscillation (period quadrupling), PE = 2 In analogy to the PE, the PCE is computed as a time-varying

bits. For the chaotic signal portion in this example, the PE reaches quantity for every point in time t where a phase space was ex-

a value of about 4 bits. For sufficiently large window sizes, tracted from the analyzed signal. For convenience, the time-

maximum entropy PEMAX is reached for a uniform histogram rep- varying quantities PE[t] and PCE[t] and the related asymmetry

resenting highly irregular vibrations as measures (AM) (Eqs. 4 and 5 below) are indicated without the

PE MAX = log2 (numHistBins) (3) trailing index [t] throughout this manuscript, unless their average

over an entire phonation was calculated, in which case they are

Finite sample effects may however lead to an underestima- explicitly indicated as averaged values (eg, “mean PE”), or they

tion of the entropy.46 are decorated with an overline (eg, PE ).

As mentioned above, the PE has a tendency to be inflated

during gradual or abrupt changes in the amplitude or in the quality Asymmetry Measures (AM)

of the analyzed signal. Two such extreme cases occur in Figure 2C In this study, phasegrams were created from both the GAW and

at t = 2.94 and t = 6.9 s, where bifurcations cause abrupt changes the hGAWL,R time series data of all analyzed phonations. Ac-

in the oscillatory regime. As the created phase portrait always cordingly, the novel PE and PCE measures were available for

relies on a time window with a certain duration (20 ms in the the GAW and the hGAWL,R signals. When comparing either the

case of Figure 2), at a bifurcation both the previous and the newly PE or the PCE of hGAWL,R, the difference between these mea-

emerging oscillatory regimes can be simultaneously visible in sures constitutes an indicator of the time-varying symmetry of

the phase portrait, and abrupt amplitude changes cause a grad- the vibratory complexity of the left and the right vocal folds.

ually changing trajectory in phase space. This is illustrated in The respective AM were defined as

Figure 2A by the phase portrait extracted at t = 2.94. Creating

a Poincaré section through such a phase portrait will necessar-

AM ( PE )[t ] =

(

abs PE (hGAW L )[t ] − PE (hGAW R )[t ] ), (4)

ily result in a more complex Poincaré section histogram with

log2 (numHistBins)

increased entropy, because an increased number of histogram

bins has values other than zero. and

ARTICLE IN PRESS

6 Journal of Voice, Vol. ■■, No. ■■, 2015

FIGURE 2. Phasegram entropy (PE) and phasegram complexity estimate (PCE) for three stereotypical vibratory regimes (periodic, subharmonic,

irregular/chaotic), illustrated by synthetic data derived from the logistic map equation, where the parameter a was varied gradually from 3.425 to

3.625—see Herbst et al33 for details. (A) phase portraits, extracted as 2 s, 2.94 s, 4 s, and 8.5 s. Poincaré section angle = 0.22 π radians (dashed

red lines); (B) phasegram of synthetic sequence generated with the logistic map equation; (C) and (D) time-varying PE [bits] and PCE, respec-

tively, derived from the individual Poincaré section histograms of the phasegram in (B)—see text. (For interpretation of the references to color in

this figure legend, the reader is referred to the Web version of this article.)

(

AM ( PCE )[t ] = abs PCE (hGAW L )[t ] − PCE (hGAW R )[t ] , ) is computed. In future investigations, nonabsolute values (without

the abs[. . .] function) might be calculated, resulting in a pa-

(5) rameter in the range of [−1..1] that could be potentially useful

Both these measures are dimensionless quantities in the range for assessing which vocal fold (left or right) exhibits a more

of [0..1]: The difference between the PE of hGAWL and the PE complex vibratory behavior in relation to the other.

of hGAWR is divided by the theoretical maximum (dependent

on the number of histogram bins, see Equation 3), and the the- Statistical analysis

oretically expected maximum PCE value is 1 to begin with To test the performance of the newly introduced measures PE,

(Appendix 1), so no normalization is required. Low AM values PCE, AM(PE), and AM(PCE), these measures were computed

(of either PE or PCE) are to be expected when both vocal folds for the phonations of all 73 female participants described in a

exhibit the same periodicity of vibration, that is, either both vocal previous section. As these phonations were stationary, the re-

folds vibrate nearly periodically (regardless of their individual spective measures were averaged over each investigated time

fundamental frequency of vibration), or when both vocal folds window, thus resulting in the mean quantities PE , PCE ,

have equally irregular vibratory patterns. Higher AM values are AM ( PE ), and AM ( PCE ). The discriminative power of these

expected when one vocal fold vibrates nearly periodically, whereas measures per diagnosis group (normal, functional dysphonia, pa-

the other exhibits an irregular vibratory pattern. Note that in Equa- ralysis) was assessed with inferential statistics. Before testing,

tions 4 and 5, the modulus (ie, the absolute value) of the difference Normality and homogeneity of variance were assessed with

ARTICLE IN PRESS

ChristianT. Herbst et al Phasegram analysis of laryngeal high-speed video recordings 7

Shapiro-Wilk and Levene’s tests, respectively. The PE mea- Irregular vocal fold vibration

sures were subjected to an ANOVA and post hoc pairwise In the irregular case (Figure 3, right panels), both the acoustic

comparisons with Bonferroni-Holm-adjusted t tests.47 The PCE waveform and the GAW did not exhibit a periodic structure. The

measures did not meet the normality criterion in the “healthy” same was true for the GVG, where no apparent regularity in the

group, and the asymmetry measures AM ( PE ) and AM ( PCE ) vibratory pattern of the vocal folds could be discerned. Conse-

were not normally distributed in the “healthy” group and in the quently, the power spectrum analysis of both the GAW and the

“paralysis” group. Consequently, these three measures were sub- acoustic waveform was not constituted by a harmonic series, and

jected to nonparametric analysis constituted by Kruskal-Wallis the GAW trajectory in the two-dimensional phase space had

tests, followed by post hoc Mann-Whitney U tests with no apparent repetitive pattern. The phasegram did not exhibit

Bonferroni-Holm correction. All statistical processing was per- stable horizontal lines, thus also suggesting an irregular vibra-

formed using the free software R.48 tory pattern.

RESULTS Glissando

In contrast to the examples presented in Figure 3, which were

Qualitative analysis—typical examples

derived from the respective participants’ attempts to create sta-

Three stereotypical vocal fold vibratory regimes (periodic,

tionary phonation, a nonstationary case stemming from a glissando

subharmonic, and irregular), generated through attempts at stable

phonation by a 50-year-old female diagnosed with left vocal fold

phonation by a normophonic female and two females with vocal

paralysis is illustrated in Figure 4. A perfunctory analysis of the

fold paralysis, respectively, are illustrated in Figure 3.

spectrogram (Figure 4A) would create the impression of a har-

monic series with nonuniform amplitudes of the individual

Periodic vocal fold vibration harmonics at the initial two thirds of the phonation, suggesting

The periodic case (Figure 3, left panels) was characterized by both a subharmonic pattern. Closer inspection reveals however that

a regular acoustic waveform and a nearly periodic GAW, having the individual partials from t = 0 ms to about t = 400 ms are not

a period of about 4.1 ms and thus a fundamental frequency of about all integer multiples of a common fundamental frequency, but

243.9 Hz. The GVG suggests a steady and repeatable vibratory combinations of two independent frequencies, as seen in

pattern with a posterior-anterior (P-A) phase delay in the opening biphonation.49,50 Between about t = 400 and t = 600 ms, the spec-

phase and an A-P phase delay in the closing phase. Because of trogram suggests a more regular distribution of the individual

the regular nature of both the acoustic waveform and the GAW, energy components. At around t = 600 ms, a true harmonic series

the power spectra derived from these signals consisted of a har- emerged. The PVG (Figure 4B) and the hGAWs (Figure 4C and

monic series with harmonics at integer multiples of the fundamental D) selected at t = 200 ms, 480 ms, and 660 ms, respectively,

frequency. Two-dimensional phase space embedding of the GAW suggest three different vibratory states. Around t = 200 ms, the

resulted in a typical limit cycle. Consequently, the phasegram derived left vocal fold had a heavily perturbed vibratory pattern, and the

from the GAW consisted of two stable lines, corroborating the right vocal fold vibrated with an amplitude modulated pattern.

periodic nature of the vocal fold vibration. Around t = 480 ms, both vocal folds had perturbed vibratory pat-

terns resembling nearly periodic oscillation (the vibration of the

Subharmonic vocal fold vibration right vocal fold was amplitude modulated), and their respec-

In the subharmonic case (Figure 3, middle panels), both the acous- tive frequencies (about 207 and 277 Hz) were most likely locked

tic waveform and the GAW had alternating cycle amplitudes, at a ratio of 3:4. The data for the sequence at t = 660 ms suggest

resulting in a period of about 10.4 ms and a cycle duration of that both vocal folds exhibited slightly perturbed and/or amplitude-

about 5.2 ms (because each period consists of two glottal cycles). modulated nearly periodic vibration at the same frequency of

The subharmonic nature of this case was also borne by the GVG, about 235 Hz (1:1 frequency ratio). All these vibratory pat-

which demonstrated alternating vibratory patterns of vocal fold terns are illustrated by the hGAW phasegrams (Figure 4E and

vibration. Every even cycle had a more pronounced A-P phase F). The heavily perturbed sequence around t = 200 ms was char-

delay in the opening phase than the odd cycles, as well as a slight acterized by the absence of uniform lines in the phasegram; the

anterior opening that was maintained until the very end of the 3:4 pattern at t = 480 ms resulted in a more concentrated cluster

open phase. As in the periodic case, the power spectrum of both of vertical lines, at least in the hGAWL; and the 1:1 synchro-

the GAW and the acoustic waveform exhibited a harmonic series, nized sequence at t = 660 ms produced two almost-stable

but with a considerably lower fundamental at about 95.9 Hz, horizontal lines, as was expected for the nearly periodic case

caused by the period doubling. The second harmonic was con- (Methods and Figure 1).

siderably stronger than the others, constituted by the vibratory The vibratory analysis suggests that the apparent harmonic

period of each cycle. Phase space embedding of the GAW re- series observed in the GAW spectrogram (Figure 4A) in the range

sulted in a typical subharmonic pattern, where the trajectory had of t = 0–560 ms was most likely the result of biphonation,51,52

to undergo two revolutions before repeating itself. The phasegram that is, of the left-right asymmetry of the vocal fold vibration.

derived from the GAW shows four almost-stable lines (reveal- A (single) fundamental frequency could only be determined in

ing the subharmonic pattern of the vibration). The small the range of t = 560–820 ms, with values in the range of about

fluctuations of the phasegram lines were caused by slight overall 220–250 Hz. The PE (Figure 4G) of both the hGAWL and the

amplitude drifts of the GAW. hGAWR decreased slightly when periodic vibration with a 1:1

ARTICLE IN PRESS

8 Journal of Voice, Vol. ■■, No. ■■, 2015

FIGURE 3. Glottovibrogram (GVG) and phasegram analysis of three stereotypical oscillatory regimes from in vivo phonation: periodic (healthy

female, left panels); subharmonic (female, vocal fold paralysis, middle panels); and irregular (female, vocal fold paralysis, right panels) – see

supplementary Movies M2, M3 and M4. (A) Acoustic signal. Because the mouth-to-microphone distance was not measured during the recordings,

the acoustic waveform was not time shifted in relation to the other time series signals to compensate for its delay; (B) glottal area waveform (GAW);

(C) GVG; (D) left: narrowband power spectrum of GAW (black) and acoustic waveform (gray), window duration = 50 ms; right: two-dimensional

phase portrait with Poincaré section; (E) phasegram of GAW.

left-right frequency ratio ensued, but it was less sensitive to the with the left vocal fold. In the AM(PE), these trends were also

change from heavily perturbed vibration to the sequence with found, but less pronounced.

3:4 frequency locking. These trends were also borne by the PCE

of the hGAWR (Figure 4H). The PCE of the hGAWL, on the other Statistical analysis

hand, showed a clear decrease from a maximum of about 0.6 The distributions of the PE and the PCE per diagnosis group are

at the beginning of the phonation, to a minimum of about 0.2 shown in Figure 5. The average (±standard deviation) GAW-

reached after 500 ms. Consequently, the AM(PCE) (Figure 4I) based phasegram entropy PE was 2.85 (±0.46), 3.02 (±0.52),

suggested differences between the regularity of the vibratory and 3.23 (±0.41) for the three diagnosis groups “healthy”, “func-

regimes of the left vs. the right vocal fold, starting around tional dysphonia”, and “paralysis”, respectively. The mean GAW-

t = 280 ms, with a more perturbed vibration of the right vocal based phasegram complexity estimate PCE for these groups was

fold (which in overall had the higher PCE values) as compared 0.21 (±0.11), 0.24 (±0.10), and 0.32 (±0.15), respectively.

ARTICLE IN PRESS

ChristianT. Herbst et al Phasegram analysis of laryngeal high-speed video recordings 9

FIGURE 4. Phasegram and PVG analysis of a nonstationary (glissando) phonation produced by a female with left vocal fold paralysis. (A) Spec-

trogram of GAW, window duration 128 ms, 40-dB dynamic range; (B) PVG of three extracted 50-ms segments starting at 200 ms, 480 ms, and

660 ms, respectively; (C) and (D) left and right hemi-GAW of three 50-ms segments extracted at 220 ms, 390 ms, and 750 ms, respectively; (E)

and (F) phasegram of left and right of hemi-GAW; (G) time-varying phasegram entropy (PE, see Methods) for left and right hemi-GAW; (H) time-

varying phasegram complexity estimate (PCE, see Methods) for left and right hemi-GAW; (I) PCE and PE asymmetry measures.

ARTICLE IN PRESS

10 Journal of Voice, Vol. ■■, No. ■■, 2015

FIGURE 5. The interquartile range distributions of PE and PCE (averaged per phonation) per diagnosis group. The whiskers indicate the fifth

and the 95th percentile, respectively, and the superimposed stars indicate the group means.

effects of the diagnosis on the calculated entropy PE , In a recent publication, the phasegram has been introduced as

F(2, 70) = 4.03, P = 0.02. Pairwise post hoc t tests revealed sig- an intuitive visualization tool for various oscillatory phenome-

nificant differences between the healthy group and the paralysis na in physics and in biology, demonstrated with analysis of the

group for mean phasegram entropy PE (Bonferroni-Holm ad- human voice.33 Here, a more specialized investigation is per-

justed P = 0.02). A Kruskal-Wallis test of PCE per diagnosis formed, showing that phasegrams are useful in analyzing signals

group revealed significant effects (H(2) = 7.60, P = 0.02), and derived from HSV recordings documenting healthy and patho-

Bonferroni-adjusted Mann-Whitney post hoc U test revealed a logical phonations. The feasibility of the approach was

significant difference between the “healthy” group and the “pa- demonstrated by creating a phasegram of the GAW from a syn-

ralysis” group (P = 0.02). thesized vocal fold vibration movie based on the output

The boxplots of the AM per diagnosis group (Figure 6) reveal of a simple two-mass model.42 The model produced three

general trends for both AM ( PE ) and AM ( PCE ) to be higher stereotypic vibratory patterns (periodic, subharmonic, irregular—

in the pathological groups than in the healthy participants. These Supplementary Movie M1), which can be clearly identified in

trends are strongest in the “paralysis” group. The mean AM ( PE ) the respective phasegram (recall Figure 1G).

values were 0.028 (±0.014), 0.031 (±0.016), and 0.040 (±0.029)

for the three diagnosis groups “healthy”, “functional dyspho- Qualitative analysis

nia”, and “paralysis”. The mean AM ( PCE ) values for these The applicability of phasegram visualization to proper in vivo

groups were 0.071 (±0.042), 0.96 (±0.056), and 0.139 (±0.103). data derived from HSV recordings of vocal fold vibration was

No significant effects of the diagnosis on the AM ( PE ) were then documented in Figures 3 and 4. In Figure 3, which is com-

found (H(2) = 1.53, P = 0.47). Calculation of the AM ( PCE ), on posed of data from three attempts to produce stable phonation

the other hand, showed significant effects of diagnosis (ie, a stationary voice output), the causal relation between the

(H(2) = 7.18, P = 0.03). In particular, a post hoc test revealed vocal fold vibratory regime (periodic, subharmonic, and irreg-

significant differences between the “healthy” group and the “pa- ular) and the generated acoustical output is clearly seen. The

ralysis” group (Bonferroni-Holm adjusted P = 0.03). glissando case illustrated in Figure 4 documents the phasegram’s

FIGURE 6. The interquartile range distributions of asymmetry measures AM(PE) and AM(PCE) (averaged per phonation) per diagnosis group.

The whiskers indicate the fifth and the 95th percentile, respectively, and the superimposed stars indicate the group means.

ARTICLE IN PRESS

ChristianT. Herbst et al Phasegram analysis of laryngeal high-speed video recordings 11

potential to also visualize nonstationary signals. The pseudo- tory regime (as indicated in the Methods), potentially resulting

harmonic structure of the GAW signal (Figure 4A, t = 0– in a somewhat inflated measure in these cases. This notion is

560 ms) could have been mistaken for a periodic vibratory regime corroborated by the lack of significance when attempting to dis-

with an uncharacteristically low fundamental frequency, when criminate the diagnosis groups based on the PE-based asymmetry

assessed by a spectrogram alone. In contrast, the system dy- measure AM ( PE ), whereas the AM ( PCE ) revealed signifi-

namics approach of the phasegram gives additional insights (which cant differences between the “healthy” group and the “paralysis”

might be overlooked by just considering the spectrogram), by group. In this light, the PCE—and its related AM,

clearly earmarking the respective sequence as complex. By and AM ( PCE ) —can be considered to be the more robust of the

large, the quantitative analyses performed in Figures 3 and 4 thus two measures, and they might as such be preferred over the PE

suggest that the phasegram is a useful complement to existent in future studies.

analysis methods, and that it provides a reliable and an intu- Based on the periodicity assessment provided by the quan-

itive basis for periodicity assessment of vocal fold vibration. titative measures PE and PCE, asymmetry measures, automatically

derived from hemi-GAW phasegrams, were introduced. To the

Statistical analysis best of our knowledge, this is the first application of hGAW anal-

In this study, the phasegram approach has been augmented by ysis. As PE and PCE provide information of the regularity of

introducing two quantitative analysis parameters directly derived vibration, comparing the results of either of these measures for

from phasegram visualizations: the PCE and the PE. The sen- the left versus the right vocal fold results in an asymmetry co-

sitivity of these parameters to changes in vibratory regime was efficient as regards the regularity of vocal fold vibration. Whereas

demonstrated with synthesized (Figure 2) and in vivo GAW data in a normal healthy voice, both vocal folds are expected to vibrate

(Figure 4F and G). Statistical analysis of a corpus of HSV data regularly at identical fundamental frequencies, in more severe

from 73 adult females demonstrates that both the PE and the voice disorders, one vocal fold (or both) may vibrate irregu-

PCE are able to indicate trends of aberrations from the period- larly. AM(PE) and AM(PCE) were designed to be sensitive to

ic vocal fold vibration paradigm, as documented by HSV. different degrees of regularity of the respective vibratory regimes

Significantly greater mean PE and PCE values were found for within the left and the right vocal fold. AM(PE) and AM(PCE)

the participants with paralysis as compared with normophonic would assume higher values if one vocal fold would vibrate nearly

(healthy) participants. On the other hand, both the PE and the periodically, whereas the other would exhibit a more perturbed/

PCE measures could not significantly discriminate the “func- irregular vibratory pattern, as is not seldom the case in a vocal

tional dysphonia” group from either the healthy participants or fold paralysis.5,53 It is thus not surprising that the mean AM(PCE)

the participants with unilateral vocal fold paralysis. This can po- could detect significant differences between the “healthy” group

tentially be explained by the great variety of vocal fold vibratory and the “paralysis” group of participants, suggesting that the

patterns found within the “functional dysphonia” group, ranging AM(PCE) is a reliable indicator of vibratory regularity asym-

from perfectly nearly periodic to heavily disturbed. Overall, these metries. In contrast to the AM(PCE), the AM(PE) was not able

results suggest however that quantitative phasegram measures to perform at significance levels, which is attributed to the fact

are promising new tools to assess the quality of vocal fold vi- that PE values in the analyzed hGAWs were in overall greater

bration in a clinical context. As these measures can be calculated than the PE values of the GAWs extracted from the same re-

with unsupervised algorithms, they may be useful for the au- spective HSV recordings, caused by increased quantization noise

tomatic analysis of HSV data, potentially aiding in the detection due to limited spatial resolution. Nevertheless, owing to the dem-

of clinically interesting HSV sequences. onstrated discriminative power of the AM(PCE) based on hGAWs,

Phasegram generation relies on two-dimensional phase space that measure is a promising new indicator of vibratory asym-

embedding of the analyzed signal. As has been pointed out metries that may prove to be useful in future research and in

previously,33 phasegrams are therefore unable to distinguish clinical practice.

between low-dimensional (deterministic) chaos and high-

dimensional (stochastic) noise. An excessively low signal-to-

noise ratio in the analyzed time series might thus lead to an artificial CONCLUSION

inflation of the PCE and the PE measures. The fact that for the In this work, the phasegram visualization method has been ex-

“healthy” group both these measures (avg. PCE = 0.21 and avg. tended to the analysis of GAW data derived from HSV recordings

PE = 2.85) were in overall higher than the theoretically ex- of both normophonic and pathological voice production. Qual-

pected values (ie, zero for purely periodic vibration—compare itative analysis showed that the phasegram is a valuable

Figure 5 with Figure 2B and C) could in part be attributed to complement to existing analysis methods, as it provides direct

this notion. On the other hand, because vocal fold vibration is insights into the time-dependent complexity of vocal fold vi-

never a purely periodic but rather an inherently perturbed bration. Because of the phasegram’s potential to condense

phenomenon,1 slightly elevated mean PCE and PE values for the information about the vocal fold dynamics of an entire phona-

normophonic participants are likely to be expected. tion into a single graph, the approach is a time-effective alternative

Figure 2 suggests that PCE and PE are highly correlated, which to studying raw video data when looking for abnormal vibra-

is further illustrated in Supplementary Figure S1. However, the tory sequences in a clinical or in a research setting. Two newly

PE measure seems to be more sensitive to both quantization noise introduced quantitative analysis parameters, the PCE and the PE,

and gradual and abrupt changes of signal amplitude and vibra- were found to significantly distinguish between a group of healthy

ARTICLE IN PRESS

12 Journal of Voice, Vol. ■■, No. ■■, 2015

females and a group of females with vocal fold paralysis. Of where d[t] is an indicator of the intersection point set’s dimen-

these parameters, the PCE can be considered the more robust sion, which is expected to be in the range [0..1]. The dimension

one. When applying the PCE to the hGAWs of the left and the

d[t] is determined as the slope of a log-log plot of ε versus C(ε).

right vocal folds, respectively, a measure of the (a)symmetry of

This calculation is performed for every point in time t where a

vocal fold vibration regularity, the AM(PCE), was derived. This

phase space was extracted from the analyzed signal, and the re-

measure was also able to discriminate between the “healthy”

group and the “paralysis” group of participants. In conclusion, sulting quantity is denoted as PCE[t].

these findings indicate that the extended phasegram approach For the purpose of assessing the performance of the PCE, a

is a promising new tool for the automated quantitative analysis numerical test was performed involving the Cantor ternary set

and the classification of voice production biosignals, with po- (Cantor set hereafter—see, eg, Reference 55, p. 93). The Cantor

tential applications in research and in clinical practice. set is created by taking a line (or, more precisely, the interval

[0,1]), removing the middle third, then removing the middle third

of the remaining segments, continued ad infinitum.

Acknowledgments The Cantor set can be considered to be a special case of the

This research was supported by the institutional fund of Palacký phasegram histogram, with an infinite number of histogram bins.

University Olomouc, Czech Republic (to C.T.H.), by the Tech- The dimension of the Cantor set is defined as lg(2)/lg(3) ≈ 0.63.13

nology Agency of the Czech Republic project no. TA04010877 Because the algorithm for computing the PCE is inspired by the

(to CTH and JGS), by the state budget of the Czech Republic correlation dimension approach, the PCE of the Cantor set should

OPVK CZ.1.07/2.3.00/20.0057 (to J.G.Š.), and by grant no. be equal to its known dimension.

LO1413/2-2 from Deutsche Forschungsgemeinschaft (to J.U. and To test this assumption, a numerical Cantor set was created

J.L.). digitally. Naturally, when implementing a Cantor set on a com-

puter, the removal of the middle thirds of each segment cannot

be continued infinitely, but must be stopped after a certain number

APPENDIX 1 of iterations. For the purpose at hand, eight iterations were found

to be sufficient. After each iteration, up to 200 data points (rep-

The algorithm for computing the PCE (see also Herbst et al’s33 resenting intersection points between the phase space trajectory

supplementary materials) is inspired by the correlation dimen- and the Poincaré sections through that phase space made during

sion approach.16,54 During phasegram generation, Poincaré sections phasegram generation) were randomly distributed over the value

are generated through two-dimensional phase space, resulting range defined by the Cantor set, and the PCE was calculated.

in a set of n intersection points between the phase space trajec- As was expected, PCE converged toward ≈0.63 with an increas-

tory and the Poincaré section, where each point xi is characterized ing number of randomly distributed data points (Figure 7), thus

by its offset from the phase space origin. For each point xi within establishing that PCE is a good proxy of the complexity of the

each such set, the number of neighboring points N(ε,i) within underlying phase space trajectory in phasegram generation.

a given radius ε is calculated as

n

N (ε , i ) = ∑ H (ε − xi − x j ) (A1) APPENDIX 2

j =1, j ≠i

where H is the Heaviside step function, defined as H(χ) = 1 if The dependence on the number of data points in the test sce-

χ ≥ 0, and H(χ) = 0 if χ < 0. In analogy to correlation dimen- nario in Figure 7 corroborates observations from early phasegram

analysis trial runs, which suggested that the PCE is dependent

sion computation, the average number of pairs xi, xj that has a

on both the number of sample points of the embedded signal

distance of xi − x j < ε is found for each intersection point set.

(Figure 1C) and the (residual) fundamental frequency of the ana-

As dual comparisons of each point pair are avoided to speed up lyzed nearly periodic and irregular signals. This dependence was

computation time, the averaged count C(ε) is expressed as comprehensively quantified by analyzing synthesized signals (with

known vibratory regimes) with a number of varying analysis

2 n n

C (ε ) = ∑ ∑ H (ε − xi − x j )

n (n) i=1 j=i+1

(A2) parameters.

The signals for the test cases were generated with the logis-

Because of the nature of the analyzed GAW signal, the tem- tic map equation

poral delay between individual data points within the intersection x [ i + 1] = ax [ i ](1 − x [ i ]) (A4)

point set equals the duration of the respective glottal cycle. There-

fore, Theiler’s correction54 was not deemed necessary. Three types of signals were generated, each with a stable pa-

During phasegram generation, the distance ε is typically varied rameter a: a = 3.2, resulting in a periodic signal; a = 3.5, resulting

in 40 equally spaced steps from 100 1

to half the maximum value in a subharmonic (period doubling) sequence; and a = 3.8, pro-

of the intersection point set. As ε increases, the correlation in- ducing an irregular signal that closely resembles deterministic

tegral C(ε) is expected to grow by a power law chaos, because it contains some residual periodic energy (see

Reference 33 figure 1c, left panel, for an example), a phenom-

C[t ](ε ) ∝ ε d[t ] (A3) enon which is found even in the most disturbed voice

ARTICLE IN PRESS

ChristianT. Herbst et al Phasegram analysis of laryngeal high-speed video recordings 13

FIGURE 7. Phasegram complexity estimate (PCE) for vectors consisting of n data points (n = [10..200]) randomly distributed over the intervals

of Cantor ternary sets generated by m iterations (m = [1..8]). PCE converges toward ln(2)/ln(3) ≈ 0.63 with increasing n.

number of different sampling frequencies (200, 400, 800, 1600,

and 3200 Hz), resulting in (residual) fundamental frequencies

of half the sampling frequency, respectively, thus simulating (re-

sidual) vocal fundamental frequencies in the range of 100–

1600 Hz. The resulting signals were upsampled to 22,050 Hz

using the Praat sinc interpolation algorithm,56 and then stored

as 16-bit WAV files.

The generated WAV files were subjected to phasegram anal-

ysis, using various values for the phasegram window duration

(0.005, 0.01, 0.02, 0.04, 0.05, 0.06, 0.08, 0.1, 0.12, 0.14, and

0.16 s) and the number of histogram bins (25, 50, 100, 200, and

400). Each individual analysis was performed three times while

scaling the amplitude of the analyzed signal by a factor of 1 (un-

changed), 0.5, and 0.25, respectively, to assess the effect of signal

amplitude.

In this manner, a total of 2475 PCEs were calculated. Anal-

ysis of these data suggested that the complexity estimate was

mainly dependent on the simulated (residual) fundamental fre-

quency and the phasegram window duration. By combining these

two parameters, the number of (residual) cycles per analysis

FIGURE 8. Phasegram complexity estimate as a function of the window was estimated. The dependence of the complexity pa-

number of (residual) cycles per analysis window for all 2475 simu- rameter on the number of (residual) cycles per analysis window

lated cases (x-axis limited to 70 cycles per window). In the case of for the three signal stereotypes (periodic, period doubling, and

deterministic chaos (red rectangles), the phasegram analysis window chaos) is shown in Figure 8. The analysis results suggest that

must be large enough to comprise about 20 cycles of the analyzed signal, in the irregular case, the phasegram window duration parame-

to guarantee that the complexity estimate converges. (For interpreta- ter must be large enough to comprise 20 cycles of the analyzed

tion of the references to color in this figure legend, the reader is referred signal, to avoid underestimation of the respective Poincaré section

to the Web version of this article.) complexity. The convergence of the three data sets (periodic,

ARTICLE IN PRESS

14 Journal of Voice, Vol. ■■, No. ■■, 2015

subharmonic, and irregular) to about 0, 0.3, and 0.65 respec- 23. Baken RJ. Irregularity of vocal period and amplitude: a first approach to

tively, shows that the reliability of the PCE is mainly affected the fractal analysis of voice. J Voice. 1990;4:185–197.

24. Behrman A, Baken R. Correlation dimension of electroglottographic

by the number of cycles per analysis window. Results form ad- data from healthy and pathologic subjects. J Acoust Soc Am. 1997;102:2371–

ditional tests suggest that other factors, such as the number of 2379.

histogram bins or the amplitude of the signal, do not substan- 25. Behrman A. Global and local dimensions of vocal dynamics. J Acoust Soc

tially influence the results. Am 1999;105:432–443.

26. Mergell P, Herzel H, Titze IR. Irregular vocal-fold vibration—high-speed

SUPPLEMENTARY DATA observation and modeling. J Acoust Soc Am 2000;108:2996–3002.

27. Zhang Y, Krausert CR, Kelly MP, et al. Typing vocal fold vibratory patterns

in excised larynx experiments via digital kymography. Ann Otol Rhinol

Supplementary data related to this article can be found online

Laryngol 2009;118:598–605.

at doi:10.1016/j.jvoice.2015.11.006. 28. Zhang Y, Jiang JJ. Asymmetric spatiotemporal chaos induced by a polypoid

mass in the excised larynx. Chaos. 2008;18.

REFERENCES 29. Zhang Y, Jiang JJ, Tao C, et al. Quantifying the complexity of excised larynx

1. Titze IR. Workshop on acoustic voice analysis. Summary statement: National vibrations from high-speed imaging using spatiotemporal and nonlinear

Center for Voice and Speech; 1995. dynamic analyses. Chaos. 2007;17.

2. Baken RJ, Orlikoff RF. Clinical Measurement of Speech and Voice. 30. Herzel H, Berry D, Titze I, et al. Nonlinear dynamics of the voice: signal

2nd ed. San Diego, CA: Singular Publishing, Thompson Learning; analysis and biomechanical modeling. Chaos. 1995;5:30–34.

2000. 31. Svec JG, Schutte HK, Miller DG. On pitch jumps between chest and falsetto

3. Bohr C, Krack A, Dubrovskiy D, et al. Spatiotemporal analysis of high-speed registers in voice: data from living and excised human larynges. J Acoust

videolaryngoscopic imaging of organic pathologies in males. J Speech Lang Soc Am 1999;106(3 I):1523–1531.

Hear Res. 2014;57:1148–1161. 32. Tokuda IT, Horacek J, Svec JG, et al. Bifurcations and chaos in register

4. Mehta DD, Deliyski DD, Zeitels SM, et al. Voice production mechanisms transitions of excised larynx experiments. Chaos. 2008;18.

following phonosurgical treatment of early glottic cancer. Ann Otol Rhinol 33. Herbst CT, Herzel H, Svec JG, et al. Visualization of system dynamics using

Laryngol. 2010;119:1–9. phasegrams. J R Soc Interface. 2013;10:1–14.

5. Svec JG, Sram F, Schutte HK. Videokymography in voice disorders: what 34. Packard NH, Crutchfield JP, Farmer JD, et al. Geometry from a time series.

to look for? Ann Otol Rhinol Laryngol 2007;116:172–180. Phys Rev Lett. 1980;45:712–716.

6. Berry DA, Herzel H, Titze IR, et al. Interpretation of biomechanical 35. Roux J-C, Simonyi RH, Swinney HL. Observation of a strange attractor.

simulations of normal and chaotic vocal fold oscillations with empirical Physica D. 1983;8:257–266.

eigenfunctions. J Acoust Soc Am. 1994;95:3595–3604. 36. Herbst CT. Glottal efficiency of periodic and irregular in vitro red deer voice

7. Hollien H, Michel J, Doherty ET. A method for analyzing vocal jitter in production. Acta Acoust United Acoust. 2014;100:724–733.

sustained phonation. J Phon 1973;1:85–91. 37. Lohscheller J, Eysholdt U. Phonovibrogram visualization of entire vocal

8. Yumoto E, Gould WJ, Baer T. Harmonics-to-noise ratio as an index of the fold dynamics. Laryngoscope. 2008;118:753–758.

degree of hoarseness. J Acoust Soc Am 1982;71:1544–1549. 38. Lohscheller J, Eysholdt U, Toy H, et al. Phonovibrography: mapping

9. Lauterborn W, Parlitz U. Methods of chaos physics and their application high-speed movies of vocal fold vibrations into 2-D diagrams for visualizing

to acoustics. J Acoust Soc Am. 1988;84:1975–1993. and analyzing the underlying laryngeal dynamics. IEEE Trans Med Imaging.

10. Herzel H. Bifurcations and chaos in voice signals. Appl Mech Rev. 2008;27:300–309.

1993;46:399–413. 39. Lohscheller J, Toy H, Rosanowski F, et al. Clinically evaluated procedure

11. Titze IR, Baken RJ, Herzel H. Evidence of chaos in vocal fold vibration. for the reconstruction of vocal fold vibrations from endoscopic digital

In: Titze IR, ed. Vocal Fold Physiology: Frontiers in Basic Science. San high-speed videos. Med Image Anal. 2007;11:400–413.

Diego, CA: Singular Publishing Group; 1993:143–188. 40. Unger J, Lohscheller J, Reiter M, et al. A noninvasive procedure for

12. Jiang J, Zhang Y, McGilligan C. Chaos in voice, from modeling to early-stage discrimination of malignant and precancerous vocal fold lesions

measurement. J Voice. 2006;20:2–17. based on laryngeal dynamics analysis. Cancer Res. 2015;75:31–39.

13. Strogatz SH. Nonlinear Dynamics and Chaos: With Applications to Physics, 41. Karakozoglou S-Z, Henrich N, d’Alessandro C, et al. Automatic glottal

Biology, Chemistry, and Engineering. First Indian Edition. Kolkata, India: segmentation using local-based active contours and application to

Levant Books; 2007. glottovibrography. Speech Commun 2012;54:641–654.

14. Fitch WT, Neubauer J, Herzel H. Calls out of chaos: the adaptive significance 42. Steinecke I, Herzel H. Bifurcations in an asymmetric vocal fold model.

of nonlinear phenomena in mammalian vocal production. Anim Behav. J Acoust Soc Am 1995;97:1874–1884.

2002;63:407–418. 43. Smith JO. Mathematics of the Discrete Fourier Transform (DFT), with Audio

15. Lauterborn W, Cramer E. Subharmonic routes to chaos observed in acoustics. Applications, 2nd ed, http://ccrma.stanford.edu/~jos/mdft/, online book, 2007,

Phys Rev Lett. 1981;47:1445–1448. accessed 2015-12-30.

16. Grassberger P, Procaccia I. Measuring the strangeness of strange attractors. 44. Bergé P, Pomeau Y, Vidal C. Order Within Chaos: Towards a Deterministic

Physica D. 1983;9:189–208. Approach to Turbulence. Paris: Hermann and John Wiley & Sons; 1984.

17. Eckmann J, Kamphorst SO, Ruelle D, et al. Liapunov exponents from time 45. Shannon CE. A mathematical theory of communication. Bell Syst Tech J

series. Phys Rev A. 1986;34:4971–4979. 1948;27:379–423, 623–56.

18. Tokuda I, Riede T, Neubauer J, et al. Nonlinear analysis of irregular animal 46. Herzel H, Große I. Correlations in DNA sequences: the role of protein coding

vocalizations. J Acoust Soc Am. 2002;111:2908–2919. segments. Phys Rev E. 1997;55:800–810.

19. Tokuda I, Miyano T, Aihara K. Surrogate analysis for detecting nonlinear 47. Abdi H. Holm’s sequential Bonferroni procedure. In: Salkind NJ, ed.

dynamics in normal vowels. J Acoust Soc Am. 2001;110:3207–3217. Encyclopedia of Research Design. Thousand Oaks, CA: SAGE Publications,

20. Herzel H, Holzfuss J, Kowalik Z, et al. Detecting bifurcations in voice Inc.; 2010:574–578.

signals. In: Kantz H, Kurths J, Mayer-Kress G, eds. Nonlinear Analysis of 48. R Development Core Team, ed. R: A Language and Environment for

Physiological Data. Berlin: Springer Verlag; 1998:325–344. Statistical Computing. Vienna, Austria: R Foundation for Statistical

21. Zhang Y, Jiang J. Acoustic analyses of sustained and running voices from Computing; 2011.

patients with laryngeal pathologies. J Voice. 2008;22:1–9. 49. Herzel H, Berry D, Titze IR, et al. Analysis of vocal disorders with methods

22. Zhang Y, Jiang J, Biazzo L, et al. Perturbation and nonlinear dynamic from nonlinear dynamics. J Speech Hear Res. 1994;37:1008–1019.

analyses of voices from patients with unilateral laryngeal paralysis. J Voice. 50. Tigges M, Mergell P, Herzel H, et al. Observation and modelling of glottal

2005;19:519–528. biphonation. Acustica Acta Acustica. 1997;83:707–714.

ARTICLE IN PRESS

ChristianT. Herbst et al Phasegram analysis of laryngeal high-speed video recordings 15

51. Neubauer J, Mergell P, Eysholdt U, et al. Spatio-temporal analysis of irregular 54. Theiler J. Spurious dimension from correlation algorithms applied to limited

vocal fold oscillations: biphonation due to desynchronization of spatial time-series data. Phys Rev A. 1986;34:2427–2432.

modes. J Acoust Soc Am. 2001;110:3179–3192. 55. Gleick J. Chaos. The Amazing Science of the Unpredictable. London:

52. Herzel H, Reuter R. Biphonation in voice signals. Natl Cent Voice Speech Vintage; 1987.

Status Prog Rep. 1996;9:109–115. 56. Boersma P, Weenink D. Praat: Doing Phonetics by Computer. Amsterdam,

53. Svec J, Sram F. Videokymographic examination of voice. In: Ma E, Yu E, The Netherlands: Institute of Phonetic Sciences, University of Amsterdam;

eds. Handbook of Voice Assessments. San Diego, CA: Plural Publishing; 2014.

2011:129–146.

## Molto più che documenti.

Scopri tutto ciò che Scribd ha da offrire, inclusi libri e audiolibri dei maggiori editori.

Annulla in qualsiasi momento.