Effects of Vocal Training and Phonatory Task On Voice Onset Time

Effects of Vocal Training and Phonatory Task
on Voice Onset Time

*Christopher R. McCrea and Richard J. Morris
*Johnson City, Tennessee, and Tallahassee, Florida
Summary: Objectives/Hypothesis: The purpose of this study was to examine the temporal-acoustic differences between trained singers and nonsingers
during speech and singing tasks. Methods: Thirty male participants were separated into two groups of 15 according to level of vocal training (ie, trained or
untrained). The participants spoke and sang carrier phrases containing English voiced and voiceless bilabial stops, and voice onset time (VOT) was
measured for the stop consonant productions. Results: Mixed analyses of variance revealed a significant main effect between speech and singing for /p/
and /b/, with VOT durations longer during speech than singing for /p/, and
the opposite true for /b/. Furthermore, a significant phonatory task by vocal
training interaction was observed for /p/ productions. Conclusions: The results indicated that the type of phonatory task influences VOT and that these
influences are most obvious in trained singers secondary to the articulatory
and phonatory adjustments learned during vocal training.
Key Words: Voice onset timePhonatory taskVocal trainingGender
difference.
phonatory13 and articulatory/resonatory410 adjustments during singing that nonsingers do not. Although these articulatory and phonatory differences
allow listeners to perceptually distinguish the two
groups during singing, the acoustic cues that help
listeners to perceptually separate trained singers
and nonsingers have not been clearly identified.
Over the last 35 years, voice researchers have attempted to correlate the phonatory and articulatory
movements of trained singers with changes in the
acoustic voice signal.714 For example, Lindblom
and Sundberg710,12 correlated vocal tract adjustments with changes in the acoustic signal through
examination of long-term average spectra (LTAS),
lateral x-ray pictures, and mathematical models of
vocal tract function. Sundberg7 reported that increases in the width of the pyriform sinuses and
laryngeal ventricle resulted in an increase of energy
between 2500 Hz and 3000 Hz or in the singers
formant in male singers. The singers formant was
INTRODUCTION
It has been suggested that trained singers are perceived to sing better than nonsingers because
trained singers learn to perform a variety of
Accepted for publication May 18, 2005.

Presented at the 33rd Annual Symposium: Care of the Professional Voice, June 26, 2004, Philadelphia, Pennsylvania.
From the *Department of Communicative Disorders, East
Tennessee State University, Johnson, City, Tennessee; and
Florida State University, Tallahassee, Florida.
Supported by a Dissertation Research Grant funded by the
Congress of Graduate Students, the Provosts Office, and the
Office of Research, Florida State University.
Address correspondence and reprint requests to Christopher
R. McCrea, Department of Communicative Disorders, East
Tennessee State University, Box 70643, Johnson City, TN
37614-1702. E-mail: mccrea@etsu.edu
Journal of Voice, Vol. 21, No. 1, pp. 5463
0892-1997/$32.00
2007 The Voice Foundation
doi:10.1016/j.jvoice.2005.05.002
54
EFFECTS OF VOCAL TRAINING

associated with a perceptual vocal ring or
brightness,1517 and it has been used as a qualifier
of good quality singing.18,19
Whereas the examination of LTAS for the presence of the singers formant has provided a means
that can frequently differentiate between trained
singers and nonsingers, other acoustic measures
have shown less promise. Brown et al11 compared
the speech and singing productions of America the
Beautiful for 20 trained singers and 20 nonsingers.
In addition to perceptual judgments, a series of
acoustic measures was conducted in an effort to
acoustically distinguish the trained singers and nonsingers. The acoustic measures included standard deviation of fundamental frequency, jitter, shimmer,
noise-to-harmonics ratio, and a series of duration
measures, including sentence, word, and syllable duration, as well as consonant-to-vowel duration ratio
for individual words. Only male standard deviation
of fundamental frequency and male perturbation
measures during speech displayed significant trained
singer and nonsinger differences. None of the acoustic duration measures displayed significant differences between the trained singers and nonsingers.
Despite Brown et al11 not finding any speech duration differences between trained singers and nonsingers, Rothman et al13 reported significant
differences in word length and alveolar stop closure
duration among perceptually identified singers, perceptually unidentified singers, and nonsingers. Nave listeners correctly identified 5 of 20 trained
singers based on standard passage readings from
a total of 20 singers and 20 age-matched nonsingers. The speech samples of the five correctly identified trained singers were acoustically analyzed for
mean speaking fundamental frequency, sentence
duration, word duration, and consonant-to-vowel
duration ratio. It was reported that the perceptually
identified trained singers displayed significantly
longer word durations for the word white than
the nonsingers or unidentified singers. In addition,
the perceptually identified singers displayed significantly shorter stop closure durations for /t/ taken
from the word white than the unidentified singers or the nonsingers. Although similar word and
stop closure duration differences were not observed
across other words or stops, the results indicated
that there may be temporal-acoustic differences
55
between trained singers and nonsingers during

speech.
Whereas the previous study focused on consonant articulation during speech, the importance of
consonant articulation during singing should not
be overlooked. Vennard17 discussed the importance
of articulation in singing and reported that consonants are an important aspect of lyrical singing,
by which the linguistic meaning of a song is expressed. However, Vennard17 conceded that trained
singers may not be up to this task during vocally
challenging musical pieces and states that, .frequently upon such high notes or in such florid work
good pronunciation suffers. Titze20 further noted
that speech intelligibility is sometimes compromised in lieu of musical phrasing. Unfortunately,
data are lacking with regard to the accuracy of specific consonant articulation during a singing task.
The task of examining the articulation of consonants during speech and singing can be accomplished through temporal-acoustic measures.
Voice onset time (VOT) has been established as
an important acoustic measure used to distinguish
voiced from voiceless stop consonants across a variety of languages.21,22 VOT is defined as the interval
between the release of an oral constriction of a stop
consonant and the start of vocal fold vibration for
the following vowel.21 When examining VOT, it
is important to realize that three VOT value ranges
may be observed, including negative VOT, zero
VOT, or positive VOT. Negative VOT scores represent vocal fold vibration before the release of the
oral constriction and are associated with the term
pre-voicing. Zero VOT represents the initiation
of vocal fold vibration simultaneous to the release
of the oral constriction. Finally, positive VOT is associated with the onset of vocal fold vibration after
the release of the oral constriction. In English, all
three of these VOT ranges can be observed in
voiced stops, but only positive VOTs are observed
in voiceless stops. The effectiveness of VOT as
a tool to help researchers and clinicians distinguish
stop consonants according to voicing and place of
articulation has been thoroughly examined.2127
Given that VOT is an effective indicator of subtle
articulatoryphonatory interaction differences in
speech production, it may prove an effective measure for acoustically representing previously
Journal of Voice, Vol. 21, No. 1, 2007
56
CHRISTOPHER R. MCCREA AND RICHARD J. MORRIS
reported physiologic differences between trained

singers and nonsingers.110 As such, it may be a useful, noninvasive method for documenting the articulatoryphonatory aspects of vocal training during
both speech and singing.
McCrea and Morris28 examined VOT for voiced
and voiceless stop consonants produced by 10 male
subjects, 5 trained singers and 5 nonsingers, during
speech and singing tasks. They reported significantly
longer VOTs for the trained singers when compared
with the nonsingers for both singing and speaking. It
was suggested that the relatively long VOTs of the
trained singers resulted from increased lingual control and lingual pressure. Their results indicated
that trained singers and nonsingers display different
acoustic-temporal patterns during speech and singing. However, the results from McCrea and Morris28
were based on only 10 male participants. A larger
sample size is required to generalize any findings.
Furthermore, McCrea and Morris28 noted that in an
effort to control for rate, frequency, and intensity
of production, the singing sample may not have
been natural. More research is needed in which truly
sung productions are measured to more clearly determine the effects of phonatory task (ie, speech versus
singing) on VOT.
The first purpose of this study was to determine
whether trained singers and nonsingers display different VOTs during speaking and/or singing. It was
hypothesized that the trained singers would display
significantly longer VOTs for bilabial stops than the
nonsingers, regardless of task. The second purpose
of this study was to examine whether the phonatory
task (speech vs singing) significantly affected VOT.
It was hypothesized that the mean VOTs for the
sung productions would be longer than those for
the spoken productions.
METHODS
Participants
The participants for this study included 30 men.
The participants were divided into groups of 15
trained singers and 15 nonsingers. The male trained
singers ranged in age from 21 to 35 years old
(mean, 27.8 years), and they reported receiving an
average of 9.37 years of private voice lessons.
The male nonsingers ranged in age from 21 to 24
years old (mean, 22 years). All nonsingers reported

that they had not received any form of vocal training since elementary school, including instruction
received from singing in a band, middle school,
high school, college, or church choir. All participants were (1) nonsmokers, (2) between the ages
of 21 and 35, (3) first-language General American
English speakers, and (4) reported no history of
neurological, vascular, or sensory-motor impairment, which would affect articulation, phonation,
and/or respiration.
Equipment
All recordings occurred in a double-walled
sound-treated booth (IAC Model 4276, Huddleston,
Satterfield, Evans & Mauney, Architects and Engineers, Tallahassee, FL) with the participant standing. The voice signal recordings were made via
a stand-mounted microphone (Shure Model SM7)
positioned 1 m in front of the participant at chest
level connected to a Computerized Speech Lab
Model 4300B hardware/software system (CSL;Kay
Elemetrics Corporation, Lincoln Park, NJ) Voice
recordings were digitized at a sampling frequency
of 44.1 kHz, stored, and analyzed using the CSL
4300B system. The light signal from a quartz metronome (Matrix MR-500) was used to pace speaking rate.
Procedure
The VOT values were calculated for bilabial
American English stop consonants in word-initial
position embedded in the phrases, A peek at a peacock and A bee at a beehive. The VOT of /p/ in
peek and peacock and the /b/ in bee and
beehive were measured for each production.
This cognate pair of stop consonants was selected
to be used because previous research had shown
similar patterns among different cognate pairs
when comparing singers and nonsingers.28 Each
participant performed at least one experimenter-supervised trial production to ensure appropriate rate,
fundamental frequency, and intensity. All participants could produce the phrases within an acceptable pitch or intensity range. The order of phrase
production and phonatory task was counteredbalanced within the groups of participants.

Each participant was instructed to sing and speak
each phrase at a 5/4 rhythm on a comfortable, single note five consecutive times at an allegro rate of
160 beats per minute (approximately three syllables/second), as set by the flashing light of a metronome. Rate was controlled because it was reported
that rate of speech has a significant effect on VOT,
with slow rates associated with long VOT and fast
rate associated with short VOT.29,30 Furthermore,
a rate of approximately three syllables/second was
chosen because it represented a moderately fast
rate of speech and has been reported to be a common oral reading rate in healthy young adults.3137
The first and last productions were excluded from
analysis, leaving the middle three productions for
analysis. A total of 720 tokens (30 participants 3
2 phrases 3 2 phonemes per phrase 3 3 repetitions
3 2 phonatory tasks) were recorded and measured
for VOT.
In an effort to capture a more realistic singing
sample, the participants were instructed to imagine
that they were performing in a large auditorium
filled with people throughout the experiment. The
cue was given to stimulate a singing sample that
represented how the participant would sound during
a performance. The influence of imagery and performance environment on temporal-acoustic measures was reported by Rothman et al.38 Despite the
nonsingers lack of significantly different duration
measures across performance environment, results
indicated that cueing participants to visualize performing in front of a crowd influenced temporalacoustic measurement and may provide researchers
with more realistic values.
Temporal analysis
In accordance with previous VOT related studies,
VOT was measured by visually inspecting the
acoustic signal as both an oscillogram trace and
sound spectrogram via the CSL 4300B software.39,40 Using both the oscillogram and sound
spectrogram displays of each phrase, VOT was
measured by placing a time marker at the onset
of the noise burst of each stop and another marker
at the onset of steady-state vocal fold vibration.
Steady-state vocal fold vibration was determined
using the combined appearance of the first vertical
striation in the second formant on the sound
57
spectrogram and the first downward peak of the

complex vowel waveform on the oscillogram
trace.39,40 The oscillogram and spectrogram were
displayed in terms of time, denoted in milliseconds
along the horizontal axis. This allowed for direct
measurement of the time between the two markers
and, thus, VOT. All VOT measures were made by
the lead investigator.
Statistical analysis
Two separate 2 3 2 mixed analyses of variance
(ANOVAs) were used to compare the participants
average VOTs across the between-subject factors
of vocal training (ie, trained versus nonsingers)
and the within-subject factor of phonatory task
(ie, singing versus speaking). Two separate ANOVAs were used to simplify the statistical design
and because it was already well established that
the VOTs of voiced and voiceless stops are significantly different.21,22,24 An alpha (a) level of 0.05
was set as the level of significance. Relative power
(1 b) and effect size (h2) were also reported.
Reliability
One fifth of the 30 participants productions
(20% of the data) were chosen at random and reanalyzed by the same investigator at least 3 weeks
postrecording to determine intrarater reliability. Interrater reliability was determined by having a research assistant (blind to the classification of the
participants) measure VOT for 20% of the data
that the lead investigator had previously measured.
Both intrarater and interrater reliability were indexed by the Pearson product moment correlation,
and both intrarater and interrater reliability were
high. For example, interrater reliability was r 5
0.85, and intrarater reliability was r 5 0.95. The
mean VOT difference between the original and remeasured data was 1.41 ms for the intrarater reliability and 3.82 ms for the interrater reliability.
RESULTS
Trained singer and nonsinger main effect
As shown in Table 1, the VOTs were nearly identical for the two groups as the trained singers displayed an average VOT for /p/ of 30.9 ms
58

TABLE 1. Trained Singer and Nonsinger Mean
VOT and SD in Milliseconds Across Voiced and
Voiceless Bilabial Stops
Vocal Training
Phoneme
Mean VOT (SD)
/p/
30.9 (12.3)
/b/
/p/
210.4 (20.7)
32.4 (10.5)
/b/
25.8 (17.6)
Trained singers (n 5 15)
Nonsingers (n 5 15)
[standard deviation (SD) 5 12.3 ms] and the nonsingers average VOT for /p/ was 32.4 ms (SD 5
10.5 ms). Similarly for /b/, the trained singers displayed an average VOT of 210.4 ms (SD 5 20.7
ms), whereas the nonsingers average VOT was
5.8 ms (SD 5 17.6 ms). The differences between
the two groups were not significant for either /p/
(F(1, 28) 5 0.209; P 5 0.65; h2 5 0.01; 1
b 5 0.07) or for /b/ (F(1, 28) 5 0.850; P 5 0.36;
h2 5 0.03; 1 b 5 0.14).
Phonatory task effects
In addition to examining trained singer and nonsinger differences, the VOT differences between the
speech and the singing tasks were examined using
the mixed ANOVAs. Examination of Figures 1
and 2 revealed that for the /p/ productions, both
groups of subjects used longer VOTs during speech
tasks. The average VOT for /p/ during singing was
25.2 ms (SD 5 10.4 ms) for the trained singers and
31.5 ms (SD 5 11.5 ms) for the nonsingers,
whereas the mean VOT for /p/ during speaking

was 36.7 ms (SD 5 11.5 ms) for the trained singers
and 33.3 ms (SD 5 9.6 ms) for the nonsingers. For
/b/, the average VOT during singing was 20.0 ms
(SD 5 20.3 ms) for the trained singers and 15.2
ms (SD 5 15.9 ms) for the nonsingers, and the average VOT during speech was 0.7 ms (SD 5 16.5
ms) for the trained singers and 3.6 ms (SD 5 14.2
ms) for the nonsingers. These differences in mean
VOT between speech and singing were significant
for /p/ (F(1, 28) 5 8.86; P 5 0.006; h2 5 0.24;
1 b 5 0.82) and /b/ (F(1,28) 5 26.56;
P ! 0.001; h2 5 0.49; 1 b 5 0.99).
Examination of Figure 1 reveals that the trained
singers displayed shorter /p/ VOTs than the nonsingers during singing (25.2 ms vs 31.5 ms), but
they displayed longer /p/ VOTs during speaking
(36.7 ms vs 33.3 ms). The differences in VOT
across phonatory task and vocal training for /p/
were significant (F(1, 28) 5 4.59; P 5 0.040;
h25 0.14; 1 b 5 0.54). However, this difference
in VOT would be imperceptible, as both of these
values are well within the normal VOT range for
/p/.
There was not an interaction between vocal task
and singing status in the VOTs for /b/ (Figure 2).
Both the trained singers and the nonsingers used
similar voice onset times during the speaking
(0.7 ms vs 3.6 ms) and the singing (20.0 ms vs
15.2 ms). However, the trained singers tended to
voice throughout the interval preceding their word
initial /b/ productions, which resulted in negative
voice onset times. These differences in VOT across
30
Voice Onset Time (msec)
Voice Onset Time (msec)
50
40
30
20
10
Nonsinger
Singer
0
Speaking
Singing
Phonatory Task
FIGURE 1. Comparison of the mean VOT for /p/ in milliseconds (ms) as a function of phonatory task and vocal training.
20
10
0
-10
-20
-30
-40
Nonsinger
Singer
Speaking
Singing
Phonatory Task
FIGURE 2. Comparison of the mean VOT for /b/ in milliseconds (ms) as a function of phonatory task and vocal training.

phonatory task and vocal training for /b/ were not
significant (F (1, 28) 5 0.007; P 5 0.94; h2 5 0.01;
1 b 5 0.05).
DISCUSSION
The purpose of this study was to examine the effects of vocal training and phonatory task on the
VOTs of bilabial stops. There were no significant
main effect differences between average VOTs for
the trained singers and nonsingers across /p/ or /b/;
however, there were significant differences between mean VOTs for spoken and sung productions. For /p/, sung productions displayed shorter
VOTs than the spoken productions, whereas for /b/,
the sung productions were produced with longer
VOTs than the spoken productions. In addition, significant interactions during /p/ and /b/ production indicated that differences between speech and singing
were greater for trained singers than nonsingers.
Discussion of the specific results follows.
Effects of vocal training
No significant differences in VOT were observed
for the main effect of vocal training. The overall
VOTs of the trained signers and nonsingers were fairly
similar. These results are similar to those of Brown
et al,11 who found no significant difference between
trained singers and nonsingers for sentence length,
word length, and consonant-to-vowel duration ratios
from speech samples. Although the current results
partially agree with the findings of Brown et al,11
they do not agree with the findings from a recent
study by McCrea and Morris,28 who found significantly longer VOTs for trained singers as compared
with nonsingers. Methodological and analysis differences between the studies may be responsible for the
different results. For example, McCrea and Morris28
included voicing as a within-subject variable in a single 2 3 2 3 2 mixed ANOVA. The current experimental design used separate ANOVAs for /p/ and
/b/ and thus treated voicing as a separate factor. Eliminating voicing as a factor during analysis of the current data may have reduced its statistical influence on
the other factors and have provided a more conservative analysis of the data. It is reasonable that this
more conservative analysis more likely reflects the
observable lack of effect of vocal training on VOT.
59
One possible explanation for the similar mean

VOTs observed for the trained singers and nonsingers in the current study was the nonsingers amount
of innate or natural singing talent. Watts et al41 recently reported that vocally untrained persons may
have natural singing talent, as demonstrated
through pitch matching accuracy. With regard to
the current study, it was possible that some nonsingers possessed natural vocal ability and used similar
articulatory and phonatory movements as those
used by trained singers. Although the inclusion criteria for the nonsingers used in this study was relatively strict and excluded anyone who had received
any vocal instruction or practice from high school to
present, it was possible that some nonsingers possessed some natural singing talent and produced
/p/ and /b/ in a manner similar to that of the trained
singers. Thus, the talented nonsingers VOTs resembled the trained singers VOTs. To rule out
this possibility, future research should examine the
articulatory timing in trained singers, talented untrained singers, and untalented untrained singers.
Even though main effect VOT differences between trained singers and nonsingers were not apparent, the search should continue for a reliable
acoustic correlate for previously described perceptual13 and physiologic16,11 differences between
trained singers and nonsingers. It may be that focusing on higher spectral moments such as standard
deviation, skew, and kurtosis rather than the mean
may provide an acoustic link between the physiologic and perceptual distinction between singers
and nonsingers. Finally, future research attempting
to acoustically separate trained singers from nonsingers should include some form of perceptual
evaluation to correlate possible psychophysical
interactions with the acoustic measures.
Effects of phonatory task
It was hypothesized that the participants would
display longer VOTs during singing than speaking.
The current results partially agree with the hypothesis. Although the current results indicated that
speaking and singing tasks were associated with significantly different VOTs, it was the VOTs during
speaking that were longer than the VOTs measured
from singing for /p/. For /b/, the VOTs were longer
during singing than speaking. McCrea and Morris28
60
reported significantly longer mean VOTs across

voiced and voiceless stops produced during singing
than those produced during speaking by male
trained singers and nonsingers. Methodological differences between the two studies may have caused
the different results for /p/. The participants in the
current study were told to imagine that they were
reading and singing the phrases in an auditorium
filled with people, despite producing the phrases in
a sound-treated booth, whereas the participants in
the previous study received no visual imagery instructions. The instructions in the current study represented an effort to make the sung phonatory tasks
approximate a singing performance. The current results supported the findings of Rothman et al38 that
indicated the use of mental imagery results in temporal-acoustic measurement differences.
It is also possible that the participants simply
placed more emphasis on the stops during speaking
than during singing. Voiceless phonemes in
a stressed word-initial position, as was the case in
the current study, are associated with longer durations.3133 Furthermore, the prolongation of a sound
is a main cue that identifies it as being stressed.
During singing, the participants may have been
anxious to sing the vocalic portion of the words
and thus produced the voiceless stops with less duration. Following this logic, a decrease in syllable
bee
at
bee
stress with a voiceless stop would be associated

with a shortened noise burst, which could explain
the shorter VOT during singing than speaking.
This adjustment would allow the participants to
produce the syllable in the duration set by the metronome with a longer vowel.
Although this explanation may be appropriate for
the productions containing /p/, the same is not true
for /b/. Lisker and Abramson21 noted that voiced
stops can display negative and positive VOTs.
Thus, relatively long VOTs for voiced stops could
be in positive or negative directions. In the current
study, the longer VOTs for /b/ were generally in the
negative direction, which indicates that the participants produced /b/ with prevoicing. During the
singing task, the negative VOTs for /b/ increased
significantly, indicating that the participants continued phonating after producing the initial and medial
/a/ in A bee at a beehive. This tendency was observed during VOT measurement. Figure 3 displays
this tendency for the participants to continue phonation until the release of the stop burst during
singing. Although prevoicing was also observed
during the speaking tasks, it was clear that the participants produced /b/ with prevoicing more often
during singing. This supported the current hypothesis and previous report28 that during singing a person will prolong a vocalic portion or quickly
ve
FIGURE 3. Oscillogram and spectrogram of A bee at a beehivesung by male trained singer #10.

release the stop burst of a voiced consonant to
maintain the melody and pitch of a sung phrase.
The differences in VOT between the sung and the
spoken productions may also reflect a difference in
articulatory accuracy. Vennard17 and Titze20 reported that during singing, accuracy of articulation
often suffers. The relatively long VOTs for /p/ during speaking probably reflected an articulatory
accurate or stressed production. Likewise, the relatively short negative VOTs for /b/ during speaking
may also reflect increased sound stress or articulatory accuracy. However, in the case of a voiced
stop, increased emphasis/ accuracy would result
in briefly negative or positive VOTs. The long negative VOTs for /b/ and the short positive VOTs for
/p/ during singing could reflect decreased articulatory accuracy and/or decreased sound emphasis
because of the overriding desire to maintain stable
melody, tone, and intensity during singing.
A potentially biasing factor may have been the
manner in which the speaking and singing stimuli
were modeled by the investigator. It was possible
that the speaking model provided by the researcher
may have placed greater emphasis on the production of /p/ during modeling the speech task and
less emphasis on /b/ during modeling the singing
task. The researcher may have unconsciously spoken the phrases with extra stress on the word initial
voiced and voiceless bilabial stops or did not use
enough word initial stress during the sung productions. Future studies should use research assistants
blind to the purpose of the study and be designed
to control the manner in which the stimuli are presented to participants.
A significant interaction occurred between phonatory task and vocal training. In this interaction,
the trained singers showed a larger VOT difference
between speaking and singing in comparison with
the speaking and singing VOTs for the nonsingers.
These results were not in agreement with two previously described studies designed to examine
speech duration in trained singers and nonsingers.11,13 Brown et al11 reported no significant differences in spoken sentence, phrase, or word duration
between trained singers and nonsingers. However,
Brown et al11 did not examine VOT. Rothman et
al13 examined closure duration, which according
to its description was equivalent to VOT, for /t/
61
and reported significant differences between

perceptually identified singers and perceptually
unidentified singers. However, Rothman et al.
reported that the closure durations of /t/ were significantly shorter for the perceptually identified
singers than the unidentified singers.
Whereas longer VOTs for trained singers than
nonsingers do not agree with research examining
speech duration in the two groups, the results
were in general agreement with the results of previous studies designed to examine articulatory and/or
phonatory function in trained singers and nonsingers during speech and singing tasks. Brown et al4
and McGlone5,6 reported several articulatory differences between trained singers and nonsingers during singing, including greater jaw and tongue
displacement and more stable lingual pressure for
trained singers, but no differences between the
groups during speech. Previous results have also indicated phonatory differences between trained singers and nonsingers during singing, including lower
vertical laryngeal position for trained singers
during high-frequency production2 and decreased
vocal fold tension during loud, high-frequency
phonation for trained singers.1,3
The current results indicated that the trained
singers and nonsingers differ during sung productions of a phrase containing /p/ or /b/ in word initial
position. This interaction further indicates that the
significant main effect of phonatory task on VOT
is more apparent in trained singers as compared
with nonsingers. As can be observed in Figure 1,
the general trend for /p/ VOT to be longer during
speaking than singing was greater in the trained
singers VOTs as compared with the nonsingers
VOTs. The observed interaction between phonatory
task and vocal training might be explained by examining some specific articulatory adjustments
learned during vocal training. For example, trained
singers learn to shape the vocal tract in a specific
configuration to produce a perceptually distinctive
tone. This distinctive tone is perceptually characterized as resonant or full sounding and has been associated with an acoustical phenomenon known as the
singing formant.810 Ultimately, the vocal tract manipulations result in an increase of space in the posterior oropharyngeal cavity and/or an overall
lengthening of the vocal tract. Although these vocal
62
tract adjustments provide the singer with an increase of acoustic energy, which allows the singer
to be heard over an orchestra, they may hinder articulatory accuracy.10 Vocal pedagogues and voice
scientists have acknowledged that trained singers
often sacrifice clear articulation to produce a perceptually desirable sound at a uniform intensity.16,17,20 The relatively short positive VOTs for
/p/ and long negative VOTs for /b/ during singing
in the current study may be a reflection of an articulatory consequence of the trained singers quickly
producing the initial stop to have time to open
and lengthen the vocal tract to produce a perceptually resonant vowel either before /b/ production or
immediately after /p/ production. Finally, further
research is needed to test the relation between vocal
tract configuration and the articulatory accuracy
proposed above.
CONCLUSIONS
These acoustic results indicated that VOT may
be an effective measure for examining vocal tract
adjustment differences between speech and singing.
Furthermore, the results provided further support
for the notion that all participants used different articulatory and/or phonatory movements during
speech as compared with singing. This finding indicates that, regardless of training, people make significant timing adjustments at the phoneme
segment level when they sing, but trained singers
seem to make more noticeable timing adjustments
than nonsingers.
In conclusion, these results represent a foundation
for future researchers interested in finding a correlation between physiologic vocal tract adjustments
during speech and singing and temporal-acoustic
measures. Future research using a combination of
physiologic, aerodynamic, acoustic, and perceptual
measures should be conducted to more closely examine the effects of vocal tract adjustment on the
temporal-acoustic signal, and the difference between the vocal tract adjustments of trained singers
and nonsingers during speech and singing. Future
research examining VOT across voice-types, such
as tenors, baritones, and basses, may also provide
some insight into the singing mechanism.
REFERENCES
1. Gauffin J, Sundberg J. Spectral correlates of glottal voice
source waveform characteristics. J Speech Hear Res.
1989;32:556565.
2. Shipp T, Izdebski K. Vocal frequency and vertical larynx
positioning by singers and nonsingers. J Acoust Soc Am.
1975;58:11041106.
3. Sundberg J, Rothenberg M. Some phonatory characteristics of singers and nonsingers. Sp Trans Lab-Quart Progress Stat Report. 1986;4:6577.
4. Brown WS, Rothman H, Williams W. Physiological differences between singers and non-singers. In: Lawrence V,
ed. Transcripts of the Seventh Symposium on Care of the Professional Voice. New York: Voice Foundation; 1975:1118.
5. McGlone R. Lingual pressure variation during singing by
trained and untrained individuals. Presented at the Fifth
Symposium on Care of the Professional Voice, New York,
June 1976.
6. McGlone R. Supraglottal air pressure variation from
trained singers while speaking and singing. In:
Lawrence V, ed. Transcripts of the Sixth Symposium on
Care of the Professional Voice. New York: The Voice
Foundation; 1977:4849.
7. Sundberg J. Formant structure and articulation of spoken
and sung vowels. Folia Phoniat. 1970;22:2848.
8. Sundberg J. The source spectrum in professional singing.
Folia Phoniat. 1973;25:7190.
9. Sundberg J. Articulatory interpretation of the singing formant. J Acoust Soc Am. 1974;55:838843.
10. Sundberg J. The acoustics of the singing voice. Scientific
Am. 1977;3:8291.
11. Brown WS, Rothman HB, Sapienza CM. Perceptual and
acoustic study of professionally trained versus untrained
voices. J Voice. 2000;14:301309.
12. Lindblom BE, Sundberg J. Acoustical consequences of lip,
tongue, jaw, and larynx movement. J Acoust Soc Am. 1971;
50:11661179.
13. Rothman HB, Brown WS, Sapienza CM, Morris RJ.
Acoustic analyses of trained singers perceptually identified
from speaking samples. J Voice. 2001;15:2535.
14. Schutte HK, Miller R. Differences in spectral analysis of
a trained and an untrained singer. NATS Bull. 1983;Nov/
Dec:2226.
15. Bartholomew WT. A physical definition of good voice
quality in male voice. J Acoust Soc Am. 1934;6:2533.
16. Miller R. English, French, German, and Italian Techniques
of Singing: A Study in National Tonal Preferences and
How They Relate to Functional Efficiency. Metuchen,
NJ: Scarecrow Press; 1977.
17. Vennard W. Singing: The Mechanism and the Technique.
4th ed. New York: Carl Fischer; 1967.
18. Kitzing P. LTAS criteria pertinent to the measurement of
voice quality. J Phonet. 1986;14:477482.
19. Wedin S, Leanderson R, Wedin L. Evaluation of voice
training. Folia Phoniat. 1978;30:103112.

20. Titze IR. Principles of Voice Production. Englewood
Cliffs, NJ: Prentice Hall; 1994.
21. Lisker L, Abramson A. A cross-language study of voicing
in initial stops: acoustical measurements. Word. 1964;20:
384422.
22. Lisker L, Abramson A. Some effects of context on voice
onset time in English stops. Lang Speech. 1967;10:128.
23. Baran JA, Laufer MZ, Daniloff R. Phonological contrastivity in conversation: a comparative study of voice onset
time. J Phonet. 1977;5:339350.
24. Klatt DH. Voice onset time, frication, and aspiration in
word-initial consonants clusters. J Speech Hear Res.
1975;18:686706.
25. Port RF, Rotunno R. Relation between voice-onset time
and vowel duration. J Acoust Soc Am. 1979;66:654662.
26. Weismer G. Sensitivity of VOT measures to certain segmental features in speech production. J Phonet. 1979;7:
197204.
27. Zlatin MA. Voicing contrast: perceptual and productive
voice onset time characteristics of adults. J Acoust Soc
Am. 1974;56:981994.
28. McCrea CR, Morris RJ. Comparisons of voice onset time
for trained male singers and male nonsingers during
speech and singing. J Voice. In press.
29. Kessinger R, Blumstein S. Effects of speaking rate on
voice-onset time in Thai, French, and English. J Phonet.
1997;25:143168.
30. Kessinger R, Blumstein S. Effects of speaking rate on
voice-onset time and vowel production: some implications
for perception studies. J Phonet. 1998;26:117128.
31. Crystal T, House A. A note on the variability of timing
control. J Speech Hear Res. 1988;31:497502.
63
32. Crystal T, House A. Articulation rate and duration of

syllables and stress groups in connected speech. J Acoust
Soc Am. 1990;49:18421848.
33. Miller JL, Grosjean F, Lomanto C. Articulation rate and its
variability in spontaneous speech: a reanalysis and some
implications. Phonetica. 1984;41:215225.
34. Ramig L. Effects of physiological aging on selected
acoustic characteristics of voice. J Commun Dis. 1983;16:
217226.
35. Snidecor JC. A comparative study of pitch and duration
characteristics of impromptu speaking and oral reading.
Speech Monographs. 1943;10:5056.
36. Snidecor JC. The pitch and duration characteristics of superior female speakers during oral reading. J Speech Hear
Dis. 1951;16:4451.
37. Walker V. Durational characteristics of young adults during speaking and reading tasks. Folia Phoniat. 1988;40:
1220.
38. Rothman HB, Brown WS, LaFond JR. Spectral changes
due to performance environment in singers, nonsingers,
and actors. J Voice. 2002;16:323332.
39. Brown WS, Morris RJ, Weiss R. Comparative methods for
measurement of VOT. J Phonet. 1993;21:329336.
40. Smith BL, Hillenbrand J, Ingrisano D. A comparison
of temporal measures of speech using spectrograms
and digital oscillograms. J Speech Hear Res. 1986;29:
270274.
41. Watts CR, Murphy J, Barnes-Burroughs K. Pitch matching
accuracy of trained singers, untrained subjects with talented
singing voices, and untrained subjects with nontalented
singing voices in conditions of varying feedback. J Voice.
2003;17:185194.

Effects of Vocal Training and Phonatory Task On Voice Onset Time

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Effects of Vocal Training and Phonatory Task On Voice Onset Time

Caricato da

Copyright:

Formati disponibili

Effects of Vocal Training and Phonatory Task

on Voice Onset Time

Accepted for publication May 18, 2005.

EFFECTS OF VOCAL TRAINING

between trained singers and nonsingers during

CHRISTOPHER R. MCCREA AND RICHARD J. MORRIS

reported physiologic differences between trained

years old (mean, 22 years). All nonsingers reported

EFFECTS OF VOCAL TRAINING

spectrogram and the first downward peak of the

CHRISTOPHER R. MCCREA AND RICHARD J. MORRIS

Mean VOT (SD)

Trained singers (n 5 15)

whereas the mean VOT for /p/ during speaking

Voice Onset Time (msec)

Voice Onset Time (msec)

EFFECTS OF VOCAL TRAINING

One possible explanation for the similar mean

CHRISTOPHER R. MCCREA AND RICHARD J. MORRIS

reported significantly longer mean VOTs across

stress with a voiceless stop would be associated

EFFECTS OF VOCAL TRAINING

and reported significant differences between

CHRISTOPHER R. MCCREA AND RICHARD J. MORRIS

EFFECTS OF VOCAL TRAINING

32. Crystal T, House A. Articulation rate and duration of

Journal of Voice, Vol. 21, No. 1, 2007

Potrebbero piacerti anche