Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
13/06/2008
Hungary
Abstract
The auditory two-tone streaming paradigm has been used extensively to study
processing mechanisms that underlie the decomposition of the composite auditory input
into coherent sound sequences, and hence the perception of auditory objects. Here we
present new results from a study of bi-stability in auditory streaming. Using relatively
long (4 minute) sequences, we show that there are two fundamentally different phases in
Corresponding author; sdenham@plymouth.ac.uk; tel +44 1752 232610; fax +44 1752
233349
13/06/2008
this process. Listeners hold their first percept of the sound sequence for a relatively long
period (first phase), after which perception stochastically switches between two or more
alternative sound organisations, each held on average for a much shorter duration
(second phase). The two perceptual phases also differ in that stimulus parameters
influence perceptual behaviour to a far greater degree in the first than in the second
phase, and during the second but not the first phase, there are significant periods when
more than one organisation can be perceived simultaneously. Furthermore, our analysis
reveals deep parallels between the dynamics of perceptual organisation in auditory
streaming and binocular rivalry. We propose an account of auditory streaming in terms
of rivalry between competing temporal associations. Based on the results of our
experiments, we suggest that in the first perceptual phase (formation of associations),
alternative interpretations of the auditory input are formed. In the second phase
(coexistence of interpretations), perception stochastically switches between the
alternatives, thus maintaining perceptual flexibility.
Keywords
auditory streaming, bi-stability, perceptual switching, auditory scene analysis
Introduction
In order to make sense of real-world environments it is necessary to identify,
extract and organise relevant information from the wealth of incoming sensory data.
The potential amount of information far exceeds the processing capacity of any living
system. However, biological organisms are not idle perceivers; rather they seek out
information about the world and the objects in it [1]. The challenge is to create
appropriate object representations on the fly and to continually modify them in order to
13/06/2008
SOA
B
A
Sounds
Perception
Figure 1. Cartoon of the auditory streaming paradigm. A sequence of low (A) and high (B)
tones presented repeatedly in ABA- groups can be perceived as a single coherent stream with a
galloping rhythm (upper right), or as two segregated streams (lower right), each with an
isochronous rhythm
13/06/2008
a
Frequency
Time
Figure 2. The four different time intervals that can be distinguished in the two tone sequences
used in our streaming experiments, and considered by Bregman and colleagues; a) SOAwithin, b)
ISIwithin, c) ISIacross, and d) SOAacross. Diagram derived from [3].
Based on the results obtained in the classical studies of auditory streaming (for a
review, see [4]), it was assumed that following an initial short build-up period
perception of tone sequences such as the one shown in Figure 1 becomes stable, when
In his experiments, van Noorden did not distinguish between t and SOA [2. van Noorden, L.P.A.S.,
Temporal coherence in the perception of tone sequences, in Institute for Perception research. 1975:
Eindhoven.]; however, recently Bregman and colleagues [3. Bregman, A.S., et al., Effects of time
intervals and tone durations on auditory stream segregation. Percept Psychophys, 2000. 62(3): p. 62636.], in a study designed to investigate the influence of the various possible time intervals on stream
segregation, found that it is in fact the within-stream offset-to-onset interval (i.e. ISIwithin or t) which is
most influential in determining the likelihood of streaming (see, figure 2).
13/06/2008
parameters fall into either the segregated or the integrated area of the f-t space
(see Figure 3). However, recent evidence [5, 6] suggests that the idea that the auditory
system fixes on a single unchanging dominant percept is to some extent an artefact of
the experimental procedures and analysis methods used in classical studies of auditory
streaming. These obscure an important detail about streaming, namely, that it fluctuates
randomly between the two possible organisations; a phenomenon known as perceptual
bi-stability. 2 Bi-stability in visual perception (for a review, see [7]) has been studied
extensively since it was thought to offer the possibility of identifying the neural
correlates of visual awareness and ultimately consciousness, by allowing perceptual
changes to be dissociated from changes in the stimulus [8]. The finding of perceptual
bi-stability in auditory streaming is important in that it raises questions about the
commonality of mechanisms underlying perceptual organisation in different sensory
modalities and it further suggests that auditory streaming, far from being a primitive and
automatic process, may be better understood in terms of generic strategies of perceptual
organisation; a notion that has important implications for models of auditory streaming.
Actually, even with this simple stimulus configuration, several different sound organisations can be
experienced; i.e. multi-stability. For the sake of simplicity we can assume that the various possible
perceptual organisations can be sorted into two main categories: integrated and segregated (for a formal
definition, see the Design section).
Segregated
Temporal
coherence
boundary
13/06/2008
Temporal
coherence
boundary
Ambiguous
Fission boundary
Integrated
Fission boundary
Figure 3. The dependence of primitive auditory streaming on frequency difference (f) and
presentation rate (characterized by the SOA) found in human psychophysical experiments using
alternating pure tones [2, 9]. Stimuli in the region of the parameter space above the 'temporal
coherence boundary' are generally perceived as two segregated streams, and those with
parameters in the region below the 'fission boundary' as a single coherent stream. Those falling
in the ambiguous region can be perceived in either way, and perception can be influenced by
top-down processes [2]. van Noorden actually reported two sets of boundaries, which are
illustrated here for comparison and clarification; the more commonly referenced blue set are
more appropriate for describing perceptual behaviour in response to short sequences, while the
green set are relevant to longer sequences such as the ones we report here. The red dot
indicates the stimulus parameters used by Pressnitzer and Hup [5, 6] the orange dot those of
Winkler et al. [10] and the yellow dots those reported in Denham and Winkler [11] see
descriptions of these experiments in the text.
In the investigation of Winkler et al. [10], a single point in the parameter space
(indicated by the orange dot in Figure 3) was tested (A = 1245 Hz, B = 931 Hz, f = 5
semitones, stimulus onset asynchrony, SOA = 100ms, tone duration = 40 ms in one and
175 ms in another condition, sequences were close to 3 minutes in duration).
13/06/2008
Participants were instructed to continuously depress a response key, when they heard
the galloping rhythm and to release the key when they did not. With the short tone
duration, t was 160 ms, the result was a rather ambiguous perception of the tone
sequence (galloping was heard on average for 61.8% of the time; SD 22.5); with the
long tone duration, t was 25 ms, the result was predominantly segregated perceptual
organisation (10.3% galloping; SD 20.1). Each participant experienced perceptual
switching in both stimulus conditions.
Pressnitzer and Hup [5, 6] tested a similar point of the parameter space
(indicated by the red dot in Figure 3; A = 587 Hz, B = 440 Hz, f = 5 semitones,
stimulus onset asynchrony, SOA = 120 ms, tone duration = 120 ms, t = 120 ms,
sequences of 4 minutes). These parameters are close to the ambiguous condition of
Winkler et al. [10] and although Pressnitzer and Hup [5, 6] used a slightly different
experimental procedure in which participants reported their perceptions using three
buttons; one for grouping (galloping rhythm), one for splitting (streaming) and the third
for cases when neither galloping nor streaming was heard, their participants also
reported considerable perceptual switching.
In the experiment reported in [11], we were interested to test whether bi-stability
was restricted to the ambiguous region or whether it would be found over a wider range
of the parameter space. We used the same one-button experimental procedure as
Winkler et al. [10] and tested the parameter combinations indicated by the yellow dots
in Figure 3. Our results [11] showed that perceptual bi-stability was present for all
conditions tested. Furthermore, although there was great variability in the mean
switching rate between and within participants, they all showed some degree of
perceptual bi-stability.
13/06/2008
13/06/2008
comparisons between the two phenomena; although we did not necessarily expect to
find a very close correspondence.
Here we report a detailed analysis of perceptual bi-stability observed in auditory
streaming as well as comparisons between auditory streaming and binocular rivalry.
These analyses and comparisons were motivated by our recently suggested
interpretation of the auditory streaming phenomenon [11] (for a detailed description, see
the Distribution of perceptual switching section) and aimed at finding new ways to
describe auditory streaming quantitatively.
Perceptual Experiments
Having found that perceptual bi-stability in auditory streaming exists over a
wide range of the feature space [11], and is not restricted to the ambiguous region [2],
we were interested to characterise the distribution and dynamics of perceptual
switching, since these aspects had not yet been determined for auditory streaming. The
results which we present here were obtained in two further perceptual experiments of
auditory streaming.
Participants
Thirty young healthy volunteers (16 male, 18-26 years of age, average 21.8
years) participated in experiment 1 and 15 (7 male, 21-25 years of age, average 22.2
years) in experiment 2. Participants received modest financial compensation for their
participation. The study was conducted in the sound-attenuated experimental chamber
of the Institute for Psychology, Hungarian Academy of Sciences. It was approved by the
Ethical Committee (institutional review board) of the Institute for Psychology. After the
aims and procedures of the study were explained to them, participants signed an
13/06/2008
informed consent form before starting the experiment. Participants were pre-selected on
the basis of the results of clinical audiometry with the criteria that the hearing threshold
between 250 and 6000 Hz should not be higher than 25 dB, and the difference between
the two ears not higher than 15 dB in the same frequency range.
Stimulus paradigm
We decided to focus the first experiment on the medium-to-large f region of
the parameter space with medium to moderately long SOAs, to see whether features of
perceptual bi-stability would show differences between the segregated (large f,
medium SOA) and the ambiguous (medium f and large f with long SOA) region of
the parameter space; the second experiment was then focussed on the region of small to
medium f and short to medium SOA. The experimental conditions used are illustrated
in Figure 4 below. Participants were presented with 4-minute long trains of the ABAstructure, where the A and B were pure tones of 75 ms duration, including 5 ms linear
onset and 5 ms linear offset ramps. In separate trains, f was 4, 10, 16, or 22 semitones
(ST) in experiment 1, and 1, 3, 5, or 7 ST in experiment 2; SOA was, 100, 150, 200, or
250 ms (thus t was 125, 225, 325, or 425 ms for the more frequent tones) and 75, 100,
125, or 150 ms (thus t was 75, 125, 175, or 225 ms) for experiments 1 and 2,
respectively. Altogether, 4 4 = 16 different types of trains were tested in each
experiment, separately. The frequency of the lower-pitched, more frequent tones (A
on Figure 1) was kept constant at 400 Hz across the different stimulus conditions.
Sounds were generated on an IBM PC computer (MEL 2.0 stimulus presentation
software Psychology Software Tools Inc.), amplified using a custom-made sound
mixer and amplifier, and delivered through Sennheiser HD 430 headphones at a
10
13/06/2008
comfortable 70-dB (SPL) intensity level. The order of the stimulus trains with different
parameters (each train 4 minutes long) was randomized separately for each participant.
Procedure
In the classical studies of auditory streaming, participants were typically asked
to report their perception after the end of each short sound sequence. Thus these
experiments were not designed to test the temporal dynamics of streaming-related
perceptual processes. Furthermore, participants were often asked to attempt to hear the
sound sequences according to one or another pattern. This method was used to find
unambiguous effects of stimulus parameters. The recent studies described in the
Introduction employed on-line measures, asking participants to report their perceptions
as they occurred throughout the presentation of relatively long sound sequences.
However, in all but one [6] of the streaming experiments reported in the literature, it has
been implicitly assumed that only two alternative sound organisations are possible; i.e.
either integrated or segregated, each linked to a specific perceived sound pattern. With
sequences similar to those presented in the current experiments (see Figure 1), the
integrated organisation was expected to result in the perception of the galloping
pattern, whereas the segregated sound organisation was expected to result in the
simultaneous perception of a high and a low tone sequence, both with uniform (but
different) presentation rate. These assumptions were reflected in the response choices
and instructions given to participants. In fact, most previous experiments only asked
participants to report when they experienced a certain pattern of sounds (e.g. the
galloping pattern). It was then assumed that participants, who did not report hearing the
designated pattern experienced the opposite sound organisation (in the example, this
would be the segregated organisation). However, in a pilot study in which we asked
11
13/06/2008
participants to describe their different perceptions in detail, we found that they a) heard
rhythmic patterns different from either one of the expected ones, and b) sometimes
heard simultaneously a pattern that involved both high and low tones and a pattern
involving only high or only low tones.
In order to eliminate possible confusion caused by the perception of rhythms
other than the galloping rhythm the notion of an integrated percept was generalized
and defined for participants as hearing a repeating pattern, which contained both low
and high tones. In turn, the notion of a segregated percept was similarly generalized and
defined for participants as hearing some repeating pattern(s) formed either exclusively
of high or exclusively of low tones, with the possibility that multiple repeating
segregated patterns (i.e., A---A---A and B-B-B) may be perceived concurrently.
Participants were to depress one response key so long as they experienced an integrated
percept and the other key when they experienced a segregated percept. The role of the
two keys was randomly assigned across participants. When participants heard no
repeating tone pattern, they were instructed to release both keys. Participants were asked
to mark their perception throughout the duration of the stimulus sequence and not to
attempt hearing the sound according to one or another perceptual organisation. The
experimenter made sure that participants understood the types of percepts they were
required to report, using both auditory and visual illustrations. Furthermore, in
experiment 1, we then divided our participants into two groups. One group received
instructions implicitly suggesting exclusivity between the segregated and integrated
percepts (as described above); i.e., the instructions were you may either hear a
repeating integrated, or some repeating segregated tone patterns, or no repeating tone
pattern . This set of instructions was similar to that employed by Pressnitzer and
Hup [5]. The other group was explicitly told that it was possible that they may
12
13/06/2008
sometimes hear both types of patterns at the same time; i.e., the instructions were
you may hear a repeating integrated, or some repeating segregated tone patterns,
possibly even both at the same time, or no repeating pattern at all . In the case that
they heard both types of patterns at the same time, they were instructed to keep both
buttons depressed. However, they were also cautioned to be sure to release the button
when they stopped hearing the corresponding pattern. In addition to the instructions,
when analyzing the responses, we discarded all those responses, which we assumed to
represent transitions between two percepts; i.e., all phases with duration shorter than
300 ms. The assumption was that in such cases, participants may simply have been
slightly inaccurate in synchronising their button presses and releases. Because the
results obtained with the two sets of instructions in experiment 1 proved to be very
similar in all regards except for a higher incidence of reporting two simultaneous
percepts in the latter group, for the sake of clarity, here we only report the results
obtained with the instructions explicitly mentioning the possibility of simultaneously
hearing repeating integrated and segregated tone patterns. The group of participants,
who received this set of instructions, included 15 volunteers (6 male, 18-25 years of
age, average 20.9 years). In experiment 2, we only used this set of instructions and all
other procedures were also identical to those of experiment 1.
Participants sat in a comfortable reclining chair in the experimental chamber
throughout the experimental session, holding a response button in each hand. Short 1-3
minute breaks were inserted between consecutive stimulus trains with longer breaks,
when the participant could move about, scheduled just before the start of the
experimental conditions (after explaining and illustrating the possible perceptions) and
after the 8th stimulus train. Further longer breaks were inserted into the session when
necessary. The experiment took ca. two hours altogether. The state of the two response
13
13/06/2008
Fi
gure 4. Experimental conditions for experiments 1 (magenta) and 2 (cyan) reported here.
Conditions are numbered separately for experiments 1 and 2 in the following way. Number 1
denotes the shortest SOA and smallest f (100 ms / 4 ST and 75 ms / 1 ST for experiments 1
and 2, respectively). Numbers increase faster through the four different fs (e.g., 2 marks
100 ms / 10 ST and 75 ms / 3 ST for experiments 1 and 2, respectively) and slower for the four
different SOAs (e.g., 5 marks 150 ms / 4 ST and 100 ms / 1 ST for experiments 1 and 2,
respectively).
14
13/06/2008
frequency differences and slow presentation rates. We have found no condition of all
those that we have tested that was stable across all participants, and no participant who
experienced stable perceptual organisation for all conditions. Furthermore, switching
occurs in all phases of the experimental session. The switching results for each
participant, condition, and position of the stimulus train within the experimental session
are illustrated in Figure 5. On average, there were 15.75 switches per condition in
experiment 1, 36.59 in experiment 2. This corresponds, on average, to one switch in
every 15.24 and 6.56 seconds, respectively; showing that perceptual switching occurs
quite often when listening to the tone sequences used to study auditory streaming. We
found no significant effect of the position of the stimulus sequence within the
experimental session for either experiment (F[15,210] = 1.44 and F[15,210] = 0.81, for
experiments 1 and 2, respectively; p>.1, both; one-way dependent ANOVA of the
number of perceptual switches with the factor Train-number [116]). These results
suggest that the observed perceptual switching does not result from learning or fatigue
within the experimental session. The number of switches per train appears to be higher
in experiment 2 than in experiment 1. This may be an effect of the parameters (f and
t; the effects of these parameters will be explored below) and/or a difference between
the participant groups (the amount of switching varies considerably across participants
see the middle column of Figure 5). The following sections examine perceptual
switching in more detail.
15
13/06/2008
Experiment 1
Experiment 2
Figure 5. Average total number of perceptual switches (red lines) and individual participant data
(black dots) plotted against a) condition, b) participant, and c) position of the 4-minute long
stimulus train within the experimental session. Results from experiment 1 are plotted in the top
row; those from experiment 2 in the bottom row. Condition numbers are defined in Figure 4
separately for experiment 1 and 2. Note that due to the randomized order of the stimulus
conditions, trains with any set of parameters could occur in any position within the experimental
session.
16
13/06/2008
Figure 6. Cartoon indicating the influence of f and t on the most prominent sequential
associations which can be made in the galloping auditory streaming paradigm. t is placed in
parenthesis, because with short-medium ts (SOAs), the rule-competition account of auditory
streaming suggests that the effect of changes in SOA (which determines t in the current
experiments) on the formation and representation of the local rule is relatively small. This is
because 1) the sounds to be connected are adjacent and 2) with relatively short SOAs
separating the sounds, the neural after-effects of the first sound are still present when the second
sound arrives.
The local versus global rule interpretation suggests that most switching will be
found for stimulus parameters which maximise competition, i.e. when both local and
global rules are strong. This occurs for small f and small t, because with small f, the
17
13/06/2008
local rule becomes strong, and with small t, the global rule becomes strong. Hence, if
this interpretation is correct, we should find most switching in experiment 1 at f =
4ST, SOA = 100ms. Conversely least competition should occur when the two rules are
not well balanced, and one or the other wins the competition very easily. In experiment
1 we expect to find least switching for the 4ST, 250ms and 22ST, 100ms conditions.
The conditions where the two rules are approximately equally matched but are both
relatively weak would result an intermediate amount of switching.
In order to distinguish these alternative interpretations (i.e., ambiguity vs.
rule-competition) we examined the total switching in each condition. The results below
support the hypothesis of competing sequential associations; the mean switching rate
peaks along a ridge where local and global rules may be considered to be roughly
balanced, and in experiment 1, it is highest for the condition with smallest f and
smallest t (see Figure 7, left panel). From this analysis, it is clear that the region of
maximum switching does not coincide with the ambiguous region of van Noorden [2].
We note that the ridge of maximum switching appears to run roughly parallel to the
temporal coherence boundary.
18
13/06/2008
experiment 2 (right panel). The colour scale indicates the mean number of switches across
participants accumulated throughout the tone trains, separately for each condition. Note that the
coloured surface is interpolated between the discrete experimental data points indicated by the
green dots. For clarity, the x and y axes of the two panels are differently scaled. See figure 4 for
the relation between the parameters used in experiments 1 and 2.
The results of experiment 1 (Figure 7, left panel) suggest that the distribution of
perceptual switching in auditory streaming is broadly consistent with competition
between alternative sequential associations, with stronger competition and more
switching where these associations are strongest. However, although the relationship
between perceptual switching and stimulus parameters generally supports the rule
competition hypothesis, there is a clear qualification evident in the results of
experiment 2 (Figure 7, right panel); i.e. switching rates do not increase indefinitely
with decreasing f and t. There is a non-monotonic relationship between the mean
number of switches and f and t, with the maximum switching found in the region of
f = 4ST and SOA = 125ms (t = 175 ms). Although the increase in switching with
increasing rule strength is intuitively easy to understand, the non-monotonic relationship
between rule strength and switching requires further consideration.
The fall-off in switching rate in the region of very small f and t suggests that
different factors, which become stronger in this region of the parameter space, also
affect switching. Due to the uniform 75-ms stimulus duration used in our experiments,
with SOA < 100ms there is no (or almost no) silent gap between successive A and B
tones. Thus it is possible that for small fs, triplets of three successive tones (ABA)
may form a unitary event and, therefore, for segregation to occur, the system has to first
extract the components from the composite before other sequential associations can be
established. This notion is supported by the literature on temporal integration, showing
that auditory input within 150-200 ms is integrated into a single unit and processed in
19
13/06/2008
many ways differently from successive sounds exceeding this period (e.g., masking,
loudness summation, detection of omissions and successive deviations [16-18]). Thus in
the case of short ts, building the global rule suffers and, as a consequence,
competition between the two rules is less balanced. Therefore, the amount of switching
decreases compared with the more balanced cases.
In contrast, with large fs and short SOAs (SOA < 100ms; t < 125 ms),
associations between successive identical tones may be formed directly and, perhaps,
even before associations between the tones with different frequencies. If this were the
case, one should expect that, contrary to the common assumption that integration is
always the first perceptual state [4], segregation should be reported first for short SOA
and large f conditions. Our results (see next section) confirm this expectation. One
possible explanation for the immediate dominance of segregation at small t and large
f is that large fs impose relatively large spatial distance between stimulus-driven
neural activity in the tonotopically organized part of the afferent auditory system, thus
weakening and delaying interactions between the activities associated with the two
different tones. At the same time, short ts allow the after-effect of the more frequent
tones to survive till the arrival of the next identical tone. This may enable easy direct
linking of successive identical tones. In accordance with this account, dominance of
segregation over temporal integration at large fs has been previously demonstrated
[19-21], even within the temporal integration period.
20
13/06/2008
organisational decision [4]. That is, segregation can only emerge after a gradual 'buildup' process [22]. In this section we show that 1) it is not always the case that
participants report integration first and 2) there is no stable final sound organisation,
rather switching between alternative organisations continues throughout the stimulation.
Figure 8 (top panels) show, separately for experiment 1 and 2, the mean first
percept (termed first phase) durations averaged across all participants. The groupaveraged value was calculated by treating integrated phase durations as positive and
segregated phase durations as negative. Therefore, the colour in the diagram shows the
overall group tendency towards one or the other first reported percept. It is clear from
the figure that for small fs, integration tends to be the first percept reported, whereas
for trains with parameters falling into the larger-f and short-SOA region, most
participants initially linked same-frequency tones (A-A or B---B), and thus perceived a
segregated percept first (for a possible explanation, see the previous section).
21
13/06/2008
First Phase
Figure 8. Group-mean signed durations (in seconds) of the first (top) and the mean of all
subsequent phases (bottom) averaged across all participants for experiment 1 (left) and
experiment 2 (right). Segregated phases were assigned negative and integrated phases positive
values. Note that the colour scale is the same for all images as indicated by the bar on the right,
and that the coloured surface is interpolated between the discrete experimental data points
indicated by the green dots. For clarity, the x and y axes of the two panels are differently scaled.
See figure 4 for the relation between the parameters used in experiments 1 and 2.
22
13/06/2008
23
13/06/2008
The ANOVA of the signed phase durations for experiment 2, showed a very
similar set of results. All three main effects were significant (Phase: F[1,12]=18.64,
p<.001, 2=0.62; SOA: F[3,42]=12.04, p<.001, =0.60, 2=0.46; f: F[3,42]=34.01,
p<.0001, =0.50, 2=0.71). The first phase showed more integration, whereas
subsequent phases were balanced (again, absolute phase duration will be analyzed
below). Segregation dominated with the shortest SOA (75 ms) and increased
monotonically with SOA, whereas integration dominated with the smallest f (1ST),
monotonically decreasing with increasing fs. Both SOA and f significantly
interacted with Phase (Phase SOA: F[3,42]=5.26, p<.01, =0.47, 2=0.27; Phase f:
F[3,42]=13.99, p<.0001, =0.66, 2=0.50). The interaction between Phase and SOA was
again caused by SOA only affecting signed phase durations in the first phase in contrast
to the balance between integration and segregation found in subsequent phases (Tukey
HSD post-hoc test with df =42, p<.05 at least for comparisons between signed phase
durations in first-phase, 100-150-ms SOA and that in all other cells; see also Figure 8
bottom right). The interaction between Phase and f was again caused by the f only
affecting signed phase durations in the first phase in contrast to the balance between
integration and segregation found in subsequent phases (Tukey HSD post-hoc test with
df =42, p<.05 at least for comparisons between first-phase 1- and 3-ST f and all other
cells). No other interactions yielded significant results.
In summary, whereas SOA and f had the expected effects on the initial (first
phase) percept (short SOAs promoting segregation and small fs integration), neither
appeared to affect the ratio between integration and segregation in subsequent phases,
which were quite balanced (i.e., integrated and segregated organisations being perceived
overall for ca. equal durations within the stimulus trains). Thus, although an initial
24
13/06/2008
Figure 9. Group-mean durations (in seconds) of the first (top) and the mean of all subsequent
phases (bottom) averaged across all participants for experiment 1 (left) and experiment 2 (right).
Note that the colour scale is the same for all images as indicated by the bar on the right, and that
the coloured surface is interpolated between the discrete experimental data points indicated by
the green dots. For clarity, the x and y axes of the two panels are differently scaled. See figure 4
for the relation between the parameters used in experiments 1 and 2.
25
13/06/2008
The images show 1) a clear overall decrease of durations from the first to
subsequent phases, as well as visible effects of t and f on phase durations in the first
but not on subsequent phases.
In experiment 1, an ANOVA test of the absolute phase durations (segregated
and integrated phases pooled together) with factors of Phase (first vs subsequent)
SOA (4 levels, see Design and figure 4) f (4 levels, see Design and figure 4) showed
a main effect of Phase (F[1,14]=46.11, p<.0001, 2=0.77) and f (F[3,42]=3.14, p<.05,
=0.83, 2=0.18). The Phase main effect was caused by longer first- than
subsequent-phase durations. Phase durations were longer for 4-ST than for 10-ST f.
The significant interaction between Phase and f (F[3,42]=4.25, p<.05, =0.86,
2=0.23) was caused by 1) longer phase durations induced by the smallest f (4ST) in
the first phase than by any other combination of Phase and f, except the largest (22ST)
f in the first phase and 2) longer phase durations by the largest (22ST) f in the first
phase than any subsequent-phase duration (Tukey HSD post-hoc test with df =42, p<.05
at least, for comparisons between first-phase 4-ST f and all other cells, except for
first-phase 22-ST f and between first-phase 22-ST f durations and phase durations in
the second phase). There was also a significant interaction between the SOA and f
factors (F[9,126]=5.45, p<.001, =0.59, 2=0.28), which stemmed from the opposite
tendency of the SOA effect at low and high fs: at low fs, phase durations increased
with increasing SOAs, whereas at high fs, they decreased with increasing SOAs
(Tukey HSD post-hoc test with df =126, p<.05 at least, for comparisons between 10ST
f, 150-ms SOA and 16 or 22ST, 100 ms as well as 4ST differing from 16 and 22ST at
250 ms). Finally the significant triple interaction (F[9,126]=3.39, p<.01, =0.57,
2=0.20) revealed that the above described SOA f interaction in determining phase
26
13/06/2008
durations only characterized the first phase, whereas subsequent phases showed a
largely uniform distribution of phase durations. No other main effect or interaction
reached significance.
In experiment 2, an ANOVA of the same structure as above yielded significant
main effects for Phase (F[1,14]=21.34, p<.001, 2=0.60) and f (F[3,42]=18.55,
p<.0001, =0.71, 2=0.57). Similarly to experiment 1, the effect of Phase was explained
by the first phase being significantly longer than the subsequent phases. Phase durations
monotonically decreased with increasing fs. The interaction between the Phase and
SOA factors (F[3,42]=5.58, p<.01, =0.81, 2=0.28), was the product of increasing
phase durations with increasing SOAs in the first phase, only (Tukey HSD with df=42,
p<.05 at least, between any pair of first- and second-phase cells at 100-150-ms SOA).
The interaction between the Phase and f factors (F[3,42]=9.88, p<.001, =0.86,
2=0.41) stemmed from the first-phase duration at the smallest f (1ST) being
significantly longer than all other phase durations, including all other first-phase
durations (Tukey HSD with df=42, p<.001 in all cases). This result revealed that longer
first-phase durations mainly occur at low fs (qualifying the Phase main effect).
Finally, the interaction between the SOA and f factors (F[9,126]=5.07, p<.01, =0.47,
2=0.27) is resolved by the finding that, except for the shortest SOA (75 ms), the lowest
f (1ST) induced longer phase durations than any of the larger fs (Tukey HSD with
df=126, p<.001 between 1-ST f with 100-, 125-, or 150-ms SOA and any other
combination of f and SOA). No other main effect or interaction reached significance.
In summary, the pattern of phase durations showed that first phases are usually
longer than subsequent ones. It is also clear from the signed phase duration results that
integration is not always the first percept. Furthermore, stimulus parameters appear to
27
13/06/2008
affect the duration of the first phase only (see the almost homogeneous distribution of
phase durations in the second phase Figure 9 bottom). The analysis of the signed
phase durations showed that small fs promote integration as the first percept, whereas
short ts promote segregation as the first percept. This result together with the pattern
of interaction between Phase and f and/or SOA for absolute phase durations suggest
that first-phase durations are shortest when competition between the two rules is the
highest (sort SOA, small f). More or less uniformly distributed phase durations in
subsequent phases leads to more overall switching with shorter first-phase durations,
because a short first phase leaves more time for switching in subsequent phases. This
explains the pattern observed in the Distribution of perceptual switching section and
qualifies the effects and explanations described in that section as referring to the first
phase: When both rules are strong they can be discovered fast and so switching between
them starts early.
28
13/06/2008
stimulus train, depending on the combination of parameters. Three different trends can
be discerned in the figures below. With parameter combinations regarded to promote
segregation (short SOA, large f), following a fast overshoot of the probability of
segregation, the ratio between segregation and integration declines slowly (see, e.g.,
figure 10, SOA=100 ms, f=16 or 22ST and figure 11, SOA=75 ms, f=5 or 7ST).
With parameter combinations regarded to promote integration, the probability of
segregation appears to increase slowly throughout the whole duration of the stimulus
trains (see, e.g., figure 10, SOA=150 or 200 ms, f=4ST and figure 11, f=1ST and any
SOA or SOA=150 ms and any f). Finally, with most of those parameter combinations
which would fall into the ambiguous region, the initial increase of segregation is
followed by a fairly stable period, in which the probability of segregation falls mostly
into a narrow range between 0.4 and 0.6 (see, e.g., figure 10, SOA=200 ms, f=10ST
and figure 11, SOA=125 ms, f=5 or 7ST).
We tested the time course of perceptual organisation within the 4-minute long
stimulus trains by comparing across three time ranges, selected from the early (20-50 s,
because there are some combinations of parameters and participants, who had not yet
given their initial response at shorter latencies; see the range shade blue-grey on Figure
10), middle (120-150 s) and late (200-230 s) phase of the stimulus trains by conducting
ANOVAs of the probability of segregation with the structure: Time-range (early,
middle, late) SOA (4 levels; see Figure 4) f (4 levels; see Figure 4). In experiment
1, increasing SOAs induced a monotonically decreasing probability for perceiving the
segregated organisation (F[3,42]=30.24, p<.0001, =0.89, 2=0.68), whereas increasing
fs induced a monotonically increasing probability for perceiving the segregated
organisation (F[3,42]=45.87, p<.0001, =0.78, 2=0.77). These results are fully
compatible with the known effects of SOA and f on perception of these typical
Perceptual bi-stability in auditory streaming
29
13/06/2008
streaming test sequences. The significant interaction between Time-range and SOA
(F[6,84]=3.65, p<.001, =0.74, 2=0.21) reflects a gradual decrease in the steepness of
the SOA effect (decreasing segregation with increasing SOAs) at later time ranges. No
other main effect or interaction reached significance. In experiment 2, similarly to
experiment 1, increasing SOAs induced a monotonically decreasing probability for
perceiving the segregated organisation (F[3,42]=19.24, p<.0001, =0.70, 2=0.58),
whereas increasing fs induced a monotonically increasing probability for perceiving
the segregated organisation (F[3,42]=74.14, p<.0001, =0.95, 2=0.84). The Time-range
SOA (F[6,84]=3.26, p<.05, =0.59, 2=0.19) and Time-range f (F[6,84]=3.54,
p<.05, =0.68, 2=0.20) interactions reflect a gradual decrease in the steepness of the
related main effects (decreasing segregation with increasing SOAs and increasing
segregation with increasing fs) at later time ranges. No other main effect or
interaction reached significance.
These results suggest that the well-known effects of SOA and f on the
perceptual organisation of the stimulus trains are prominent at the beginning of the
stimulus sequences and they diminish with time. This conclusion further supports our
observations regarding the different properties of the first and subsequent phases.
30
13/06/2008
31
13/06/2008
The fact that switching continues throughout the duration of the whole stimulus
train, is clearly shown in figures 12 and 13 in which group-averaged phase durations are
plotted for each condition as a function of time within the stimulus trains. The low
phase-duration period at the beginning of the stimulus trains (<20 s) reflects the fact that
participants with a fast initial report usually switch their perception after a short period
of time (the blue-grey-shaded column at the beginning of each plot shows the time
when not all participants have yet given their response, separately for each SOA).
Overall, phase durations are quite stable with a decrease towards the end of the stimulus
trains. We assessed the time course of mean phase durations by comparing across the
three time ranges used in our test of the time course of segregation/integration (see
above; early: 20-50 s; middle: 120-150 s; and late: 200-230 s time ranges in the stimulus
32
13/06/2008
trains). Only significant effects related to the time ranges are reported, because we have
already tested the effects of SOA and f on phase durations (see the previous section).
In experiment 1, phase durations were significantly reduced in the late as compared to
the early and middle time ranges (F[2,28]=5.85, p<.05, =0.80, 2=0.29). The
Time-range and f factors showed a significant interaction (F[6,84]=2.68, p<.05,
=0.69, 2=0.16), which was mainly caused by the low fs (4, 10ST) in the late range
producing shorter phases than most other combination of the f and the Time-range
factor (Tukey HSD post-hoc test with df =84, p<.05 for comparisons between the
late-range 4 and 10ST phase durations and all but the 10-ST phase durations in the other
two time ranges). In experiment 2, phase durations were again significantly reduced in
the late as compared to the early and middle time ranges (F[2,28]=10.23, p<.01, =0.76,
2=0.42). The Time-range and f factors also showed a significant interaction
(F[6,84]=4.66, p<.01, =0.51, 2=0.25), which was caused by the lowest f (1ST)
producing longer phases in the early and middle range than any other combination of
the Time-range and f factors (Tukey HSD post-hoc test with df =84, p<.001 for
comparisons between the 1-ST early- and middle-range phase durations and any other
combination of Time-range and f, including the 1-ST late-range phase duration).
It is not clear, whether the decrease of phase durations at the end of the stimulus
trains indicates that four minutes is an important time scale in stream segregation, or
that participants acquired a sense of the length of the stimulus blocks and these changes
reflect their expectation of the termination of the train. It is also possible that this result
is simply an artefact of the end of the stimulus block (which is 10 seconds after the end
of the late time range used) cutting short some of the longest phases falling into this
time range and thus distorting the full distribution of phase durations. Using longer
stimulus blocks in follow-up experiments may shed light on this issue. The interactions
Perceptual bi-stability in auditory streaming
33
13/06/2008
between Time-range and f reflect this effect combined with the previously observed
effect showing that low fs induce long (integrated) first phases.
Figure 12. Experiment 1: Time course of group-average phase duration separately for the four
different SOAs, overplotting the different fs (marked with differently coloured lines). Note
that the blue-grey bars indicate the initial period during which not all participants had yet made
their initial choice; the data during this time is therefore averaged over only those participants,
who had reacted by the given time. The violet bars mark the periods used for the statistical
analysis.
34
13/06/2008
Fi
gure 13. Experiment 2: Time course of group-average phase duration separately for the four
different SOAs, overplotting the different fs. See description in the legend of Figure 12.
In summary, we found that for the typical tone sequences used to study auditory
streaming, 1) integration is not necessarily the first percept and 2) no stable final percept
emerges. These statements appear to contradict the classical findings. The
contradictions are, however, explained by the different assumptions and methods
employed by the present study (together with the few similar previous studies [5, 6, 10,
11]) and classical explorations of auditory streaming. The assumption of a stable final
percept led most experimenters in the past to use short (typically <20 s) trains and ask
participants about their final percept. Looking at the cross section of figures 10 and 11
at ca. 15 s, we find that our results closely match those of e.g., van Noorden [2].
Furthermore, the initial 20 s of the curves shown on these figures indeed give the
impression of a fast but gradual build-up of streaming. The difference between the
current and the classical view is that, whereas this build-up has been usually
35
13/06/2008
interpreted as more and more participants reaching the final percept, our data shows
that the group-average probability of perceiving the sounds in terms of one or another
percept is a product of averaging between perceptual states, which switch back and forth
all the time, but with shorter- and longer-term changes in the statistics of the switching
behaviour. In fact, our results argue for a different distinction in the temporal dynamics
of auditory streaming. Instead of build-up and final percept it appears that
distinguishing between the first perception (first-phase) and subsequent states, as is
often the case in the analysis of visual bi-stability, may prove a more fruitful description
of the temporal behaviour of perceptual processes. Results described in this section
argue that these two periods (and perhaps more, if changes by the end of the 4-minute
trains are not related to the length of the stimulus blocks) may show characteristically
different properties and thus may be understood by different theoretical and
computational models.
From the above views it follows that an improved description of auditory
streaming must cover the dynamic properties of this perceptual phenomenon. For
further insight we next turned to study similarities and differences between auditory
streaming and visual bi-stability. The following section will test whether auditory
streaming also shows some important properties of binocular rivalry. The findings will
be incorporated within our rule-competition theory, which already provides
explanations to many of the classical (e.g., integration is more often the first percept,
because building local rules is faster; etc.) and some of the novel findings (switching
maxima; segregation as the first percept; etc).
36
13/06/2008
37
13/06/2008
Figure 14. Cartoon illustrating changes in dominance durations with changing image contrast in
one eye: the validity of Levelt's second proposition [23]; taken from [24]. On the left, a series of
four plots shows how the dominance durations of the variable (solid line) and fixed (dashed
line) contrast eyes changes as a function of contrast in the variable-contrast eye. The y axis
indicates the normalised dominance duration, and the x axis, the log of the contrast in the
variable contrast eye. As shown, relationships such as that in the upper left plot are consistent
with Levelts proposition [23], whereas those in the lower left plot, are inconsistent; there
appears to be a continuum from one extreme to the other (illustrated in the middle two plots).
The plots are colour coded to map onto the box at the lower right, which indicates the absolute
contrast levels under which these different relationships were observed. As can be seen, Levelts
second proposition holds when the contrast level in the fixed eye is high, but not when the
contrast level in the fixed eye is low.
38
13/06/2008
similar to that of the strength of competing rules in auditory streaming. That is, higher
contrast in binocular rivalry would be analogous to having a stronger rule for a
particular organisation in auditory streaming. We have already suggested that f and t
affect local and global rules differently: f affects only the strength of the local rule,
whereas t only affects the global rule. However, although SOA also has a small effect
on the local rule (the across-stream t decreases with decreasing SOAs, which
somewhat strengthens the local rule), we will ignore this factor in the following
description. As a consequence, by changing f while keeping the SOA constant we
manipulate the strength of the local rule while keeping the strength of the global rule
fixed; and, similarly, changing SOA while keeping f constant manipulates the strength
of the global rule while keeping the strength of the local rule (more or less) fixed. As a
first step, for comparison between binocular rivalry and auditory streaming, figure 15
was constructed to show the group-averaged durations of the first two perceptual phases
in experiment 1. 3 We chose to analyse the mean duration of the first two perceptual
phases, because our previous analyses (see above) showed that f and SOA has little
effect on phase durations in later time ranges of the 4-minute trains. Mean integrated
phase durations are plotted with respect to the stimulus parameters as an interpolated
solid surface, and mean segregated phase durations are plotted as a meshed surface. By
taking slices through these surfaces at different values of f and SOA, we can compare
the effects of rule strength in auditory streaming with the effects of contrast levels in
vision.
Because experiment 2 covered a much smaller range of parameters in f and SOA than experiment 1,
only results of experiment 1 were used in this analysis.
39
13/06/2008
Figure 15. Interpolated surfaces showing the group-averaged durations of the first two
perceptual phases obtained in experiment 1 as a function of f and SOA; integrated (solid
surface) and segregated (mesh surface) phases. Colours code phase durations (redundant with
the vertical axis).
but
has
little
effect
on
the
segregated
phase
durations
(the
variable-rule-strength percept; compare the top left panel of Figure 16 with panel A
of figure 14). In contrast, at 22-ST f (weak local rule), increasing the global-rule
strength, by decreasing the SOA substantially increases the mean segregated phase
Perceptual bi-stability in auditory streaming
40
13/06/2008
Figure 16. Local- and global-rule-strength effects, colour-coded as in Figure 14: analysed for f
(local rule; left column) and t (global rule; right column). Phase durations for the
Perceptual bi-stability in auditory streaming
41
13/06/2008
fixed-strength rule are shown with dashed, those for the variable-strength rule with
continuous lines. Note that the rule strength progressively decreases from the top row to the
bottom row, and within each plot, rule strength decreases from left to right; i.e. the opposite
direction than in figure 14.
Thus another analogy can be shown between visual bi-stability and auditory
streaming. Specifically, assuming an analogy between the strength of local- vs.
global-rule representations in auditory streaming and contrast levels of images presented
to the two eyes in the binocular rivalry situation, parameters affecting the "strength" of
these representations had similar effects on the dominance phase durations. Together
with the similarities mentioned in the introduction, this analogy suggests that the
principles of computational models of binocular rivalry may be applicable in modelling
auditory streaming. However, there is a caveat to this claim. Here we considered only
the first two perceptual phases, since we showed earlier that stimulus parameters only
have significant effects on phase durations during the first perceptual phase. In contrast,
the findings in binocular rivalry [24] may relate to later phases of perceptual switching,
as Brascamp et al excluded the first minute of participant responses from their analysis.
Thus the analogy between auditory streaming and binocular rivalry may not extend to
the distinction between initial and subsequent phases in auditory streaming. This,
however, does not reduce the value of the analogy in helping to identify generic
modelling principles applicable to both modalities.
42
13/06/2008
parameter space roughly corresponding to the region where local and global rules are
balanced. An ANOVA of the proportion of both responses in the subsequent phases
with the structure of SOA (4 levels; see Figure 4) f (4 levels; see Figure 4) showed
only a significant interaction between the two factors (F[9,126]=3.06, p<0.05, =0.59,
2=0.18), which was caused by opposite effects of f at different SOAs: At the shortest
SOA, increasing f resulted in decreasing proportion of both responses, whereas at
the longest SOA, increasing f increased the proportion of both responses (Tukey
HSD post-hoc test with df =126, p<.05 between the proportion of both responses at
shortest-SOA (100-ms) and largest f (22ST) and that with the two shortest SOAs (100
and 150 ms) and the smallest f (4ST)). Thus segregated and integrated sound
organizations are more often perceived simultaneously, when the rules are of
approximately equal strength: When both are either strong or weak at the same time.
Note that the distribution of both-percept responses does not follow exactly the
distribution of perceptual switching (which is high only, when both rules are strong; see
Figure 7), nor does it coincide with the intersection of the phase duration surfaces,
where integration and segregation are balanced (see Figure 15).
43
13/06/2008
Figure 17. Distribution of mean steady state both-percept responses as a proportion of postfirst-phase stimulus duration in experiment 1. Note to avoid problems arising from accidental
simultaneous button presses we included in this analysis only phases with duration exceeding
300ms.
Both types of patterns being perceived at the same time contradicts our intuitive
assumption of the exclusivity of two competing perceptual organisations in the
alternating two-tone sequence, as well as the findings of Pressnitzer and Hup [5, 6]
and much of the visual literature on bi-stability. However, it has been shown in vision
that contrary to the usual assumptions of exclusivity [12], periods of 'transition' during
which neither eye is clearly dominant can be of rather long duration; comparable with
'eye' dominance durations [24]. This is consistent with the durations of the bothpercept responses in our experiment 1. Furthermore, in another striking analogy
between visual and auditory bi-stability, the distribution of mean both-percept
durations with respect to rule strength resembles the distribution of 'transition' durations
in the binocular rivalry paradigm [24]; as illustrated in figure 18.
44
Min Min
13/06/2008
Max
Max
Figure 18. Transition (both-percept) phases in binocular rivalry and auditory streaming. Left:
Relationship between transition durations and image contrast (governing the "strength" of the
alternative eye activations) in binocular rivalry; where Dep. refers to the dominant percept
prior to the transition, and Dest. to the percept following the transition; taken from [24].
Right: Relationship between both-percept durations and rule strength in experiment 1. Dep.
refers to the rule strength corresponding to the perceptual organisation prior to the 'both'
response, and Dest. to the rule strength corresponding to the perceptual organisation
following the 'both' response. For the perceptual state of 'integration', the rule strength was
considered to correspond to f; and for 'segregation' the rule strength was considered to
correspond to SOA.
It is not clear at this stage what gives rise to the perception of both patterns
simultaneously, and intuitively we think of them as mutually exclusive. One possibility
is that there is a very rapid switching between the two alternatives but that conscious
perception is more sluggish, and unable to follow this rapid switching. Hence there is a
sort of stroboscopic effect in which both perceptual organisations are perceived as being
present although there is actually switching between them. An alternative explanation
arises from the notion of competition at several different levels in the perceptual
hierarchy. Generally, recurrent top-down connections tend to ensure that there is
consistency across the whole system, but there may be instances in which the top-down
signals cannot overcome the local competitive interactions; in this case, incompatible
Perceptual bi-stability in auditory streaming
45
13/06/2008
winners could emerge at different levels, and this inconsistency may take some time to
resolve before a consistent organisation emerges throughout the hierarchy. In our
computational modelling studies, we have observed both of these effects and we are
planning to formulate a follow-up experiment to distinguish between these two
possibilities.
General Discussion
Interesting theoretical insights into the processes underlying streaming can be
derived from the experiments reported above. In this section, we present a conceptual
framework for auditory streaming which accounts for previous results as well as our
new findings. Here follows a summary of the novel observations obtained in the current
experiments:
1) Switching between integrated and segregated percepts continues throughout
the stimulus sequences with any combination of f and t (SOA) and in all participants.
Switching is not a product of learning or fatigue. Rather, there appears to be no final
stable percept in auditory streaming.
2) Participants report simultaneous perception of integrated and segregated tone
patterns (both responses); that is, segregated and integrated percepts are not exclusive.
3) With medium-to-large fs and very short ts (SOAs) the first reported
percept is segregation; that is, integration is not always the first percept. Hence,
segregation is not a result of evidence gathering.
4) When f is low/medium and t (SOA) is short the first perceptual phase is
typically much shorter than first phases with other combinations of the two parameters.
The overall amount of switching is also highest in this region. The effect is not fully
46
13/06/2008
monotonic, first percepts become somewhat longer and the overall amount of switching
increases with very short SOAs and at very low fs; the minimum of the first-phase
duration distribution and the maximum amount of switching appearing at 100-ms SOA
within the 4-10-ST f range in the current experiments.
5) With time, the ratio between integrated and segregated percepts tends to
become balanced irrespective of the combination of f and t (SOA).
6) f and t (SOA) affect the duration of the first perceptual phase in response
to the tone sequences similarly to the way in which visual contrast affects dominance
durations in binocular rivalry. It appears as if varying f affects the integrated and
varying t (SOA) affects the segregated sound organisation similarly to the way in
which 'contrast' is assumed to control the competitiveness of an image in the given eye.
This result supports the notion that f controls the competitiveness (strength) of the
integrated, whereas t (SOA) controls that of the segregated sound organisation.
7) The distribution of the duration of both-percept responses as a function of
the strength of the preceding and the following organisation, assuming the above
described relationship between f and the integrated and t (SOA) and the segregated
organisation, is similar to the distribution of the duration of transitional phases in
binocular rivalry.
The observed qualitative differences between the first and subsequent perceptual
phases have strong implications for theoretical accounts of streaming, which must
explain both the perceptual bias of the first phase, and the continuing switching
behaviour. It is important to note that our distinction here between first and subsequent
perceptual phases is not the same as the usual distinction between build-up and final
47
13/06/2008
percept [4], because the first phase is generally longer than the expected build-up
duration, and we found no evidence for a stable final percept.
Since participants attended the sounds without attempting to hear them
according to one or the other organisation, the current results cannot tell us whether
switching occurs in an automatic fashion or is the product of attention. Previous
electrophysiological studies [10, 25] obtained correlates of both bottom-up and
attentional processes when participants were exposed to tone sequences similar to those
used in the current experiments. There is some indication that the initial phase of
streaming may be more sensitive to attentional effects [25-27], whereas maintaining
sound organisations does not require focused attention [28]. However, there is evidence
that the segregated organisation can develop [29] without attention, although this may
depend
on
the
actual
stimulus
configuration.
On
the
other
hand,
these
electrophysiological studies did not measure the perception of the critical sounds
on-line, thus possibly averaging together trials with different perceptual organisations.
Therefore, no strong conclusion can be drawn regarding the relationship between
attention and the observed differences between the first and subsequent perceptual
phases.
In the following account, we shall distinguish between the first and subsequent
perceptual phases. We shall attempt to describe them in terms of alternative rule
representations which vie for dominance, and test the viability of this explanation in the
face of our novel findings. As will become clear, a useful way to think about the two
perceptual stages may be as two distinct processes, formation of sequential associations
and coexistence between alternative interpretations.
48
13/06/2008
49
13/06/2008
by f and the extent to which the neural activity elicited by the two tones overlaps,
which in turn will determine the duration before the system can discover the
segregated organisation. This is well illustrated in Figure 16 (right panels), in which f
has a clear effect on the duration of segregation for fixed SOAs. Thus the two effects of
f go hand in hand: smaller fs improve the formation of local-rule links and delay the
formation of global-rule links; larger fs weaken the formation of local-rule links and
allow faster formation of global-rule links.
The above description is consistent with the finding that the fission boundary is
much larger than the smallest detectable frequency difference [30], since segregation
requires not only a detectable difference, but also a clear separation of activity.
Similarly, the amplitude-modulation 'fission' boundary was shown to be much larger
than the smallest detectable amplitude-modulation difference [31]. Although initially
puzzling, the finding of a very stable fission boundary, measured in terms of the
minimum f necessary for segregation [30], can also be reconciled with perceptual
switching; while the formation of separate clusters of activity may depend on some
minimum featural difference, our results show that subsequent stochastic switching is
largely independent of stimulus features. The claims of various groups, e.g. [32-34] that
adaptation in primary auditory cortex is a neural correlate of streaming is also consistent
with these ideas, since adaptation is likely to be an important aspect of the process of
clustering. Therefore, it will be linked to the duration of the first phase, and hence the
rate at which the probability of reporting streaming develops [32-34].
Integration may also be reported initially as a result of a qualitatively different
mechanism when f is small and SOA is short. When two sounds are presented within a
temporal window of less than about 200 ms duration, then they tend to be processed as a
50
13/06/2008
single event [19], at least within a limited range of frequency differences [20]. In our
experiments this occurs primarily for SOAs 100 ms, and f < 4ST. In these cases the
integrated pattern first perceived is likely to be an ABA chunk. The auditory system
then needs to pull this chunk apart before it can discover the alternative within-stream
sequential associations. Nevertheless, perceptual switching is found here too.
Segregation as the first percept at short SOAs and larger fs can also be
understood in terms of the tendency to chunk acoustic stimuli into single events if they
occur within approximately 200ms of each other [19]. When f exceeds the proposed
spectral window of integration [20], consecutive identical tones may be directly linked
(i.e., with the B tone excluded, the integration window contains A-A). A direct
consequence of this explanation is that first-phase segregated percepts must be based on
the more frequent stimulus (A-A, they cannot be B---B); a prediction which we plan to
test experimentally. Neurophysiological studies suggest a reason for this phenomenon
since it has been shown that responses in cortex are greatly reduced when events follow
each other at a rate faster than about 10 Hz [35]. Therefore, at fast presentation rates, it
is possible that due to insufficient recovery time some populations only respond to one
or other of the tones from the outset. Hence the higher order sequential associations can
be discovered immediately, resulting in segregation becoming the first percept. In our
data, the minimum f for reporting first phase segregation is approximately 5 semitones
at 125-ms SOA, which is not inconsistent with the spectrotemporal window of
integration reported in [20]. Thus, similarly to f, SOA also has two different effects on
the formation of rule representations. Firstly it governs the formation of associations
between consecutive tones, because for such associations the neural after-effects of the
first tone must still be present at the time the next tone arrives. This primarily affects the
formation of within-stream associations since twithin > tacross. Secondly, with short
Perceptual bi-stability in auditory streaming
51
13/06/2008
SOAs the recovery of the neuronal elements in higher (cortical) levels of the auditory
system may become an issue, and this primarily affects the local rule. Again, the two
effects go hand in hand: shortening the SOA strengthens the associations necessary for
representing the global rule more than the local rule (see Figure 16, left panel dF =
22ST), then, especially at small fs, shortening the SOA weakens the local-rule
associations more than the global-rule ones (see Figure 16, left panel dF = 4ST).
One argument against this description of the SOA effects is that streaming can
be induced by the presenting a sequence of only one of the tones prior to the onset of
the ABA_ABA pattern [36]. Based on this finding Bregman et al. [3] argue against
SOA being an important determinant of streaming, suggesting instead that the
within-stream ISI (t) is the major factor. However, within the above-described
framework, it is easy to understand how the induction sequence used by Rogers and
Bregman [36] led to increased reports of streaming at the onset of the ABA pattern. The
induction sequence promoted the establishment of sequential associations between tones
in one of the streams, thereby biasing the system towards activity associated with
single-frequency streams to be perceived first. In this case, the formation phase would
involve the discovery of linkages between temporally adjacent events, the local rules.
Thus, the effect of the induction sequence is similar to that of very low (<100 ms) SOA
and intermediate or high f in that it forces the global rule to be established before the
local one. However, our results predict that eventually the links required for the local
rule would also be established and switching would commence (but this was not studied
due to the short duration of the sequences).
In summary, the first perceptual phase is essentially concerned with the
formation of alternative perceptual organisations. The perceptual organisation
perceived during the first phase is determined by the stimulus parameters, and the
Perceptual bi-stability in auditory streaming
52
13/06/2008
duration of this phase depends on the time taken to discover and represent alternative
sequential associations. Consistent with findings in vision that local competition is
necessary in order to trigger perceptual bi-stability [37], the idea emerging from our
auditory streaming studies is that the discovery of feature-sensitive rules or
associations and competition between incompatible associations is necessary for
triggering changes in global perceptual organisation. A consideration of the mean first
phase durations suggests that most previous streaming experiments, which used
relatively short stimulus sequences have largely characterised the initial phase of
perceptual organisation.
53
13/06/2008
situations, one almost never focuses for long periods of time on unchanging stimulation.
This is because no new information can be extracted; whereas adaptation to the
environment requires that we discover changes that occur in it. Thus unchanging
stimulation soon becomes part of the background, filtered out by lower-level
mechanisms, until something changes. In this sense, the participants task in these
experiments was rather unnatural: attend to the stimulation and be constantly aware of
your perception, although nothing changes. It is possible that similar bi- or multi-stable
phenomena can occur in natural situations. However, we are not generally aware of it,
because they become part of the unattended background. One possibility is that one
function of switching back and forth between alternative organisations could be to
maintain representations of the alternative organisations while the sounds are not
attended.
The two suggested views do not contradict to each other. Perceptual flexibility
in interpreting the currently unattended background may serve us well. The above view
is strongly supported by findings showing that 1) sound organisation is functional even
when attention is directed away from the sounds [21] and, critically, it is reset when
attention is turned towards them [26, 27]. That is, when we direct our attention away
from a sequence of sounds (treating it as background), the available rule representations
are maintained. However, when we turn our attention again towards these sounds
(selecting them as the foreground), we re-evaluate the possible descriptions. One
consequence of this notion is that by directing attention away then back to the sound
sequence, we could force a new first phase within the middle of the sound train. This we
shall explore in our next follow-up experiment.
The coexistence of segregation and integration is supported by the finding of
significant proportion of both-percept responses. This initially rather surprising
Perceptual bi-stability in auditory streaming
54
13/06/2008
finding provides further support for the notion that the perceptual organisations which
have been found are simultaneously represented within the brain, even if we are not
always aware of them. Furthermore, the idea that these periods result from a slow
transition from one state to another is contradicted by our finding that after the first
phase, the dynamics of perceptual switching is largely parameter-independent.
However, the finding that the longest duration transition and both responses appear
to occur long a ridge where the strength of the departure and destination organisations is
balanced indicates that equality between competing features (or rules) gives rise to
network conditions where inconsistent winners are most likely to emerge. We suggest
the potentially testable hypothesis that both-percept reports provide evidence for the
existence of a processing hierarchy in the auditory system, within which inconsistent
winners can sometimes emerge at different levels of the hierarchy.
Conclusions
When the auditory system is exposed to an unchanging sequence of sounds,
which can be organised in more than one way, perceptual bi-stability is pervasive. There
is no combination of features that we have tested for which perception remains stable
for even a few minutes. Analysis of the experimental data revealed two phases of
perceptual organisation, which can be characterized by two distinct processes;
formation of sequential associations and coexistence between alternative interpretations.
These perceptual phases differ in their dynamics (perceptual durations differ on average
by an order of magnitude), and sensitivity to stimulus features (which significantly
influence perceptual choice and phase duration only in the first phase). The detailed
similarities between perceptual switching in vision and audition argue for generic
modality-independent processes acting in the second phase employing common
55
13/06/2008
Acknowledgements
This work is supported by the European Research Area Specific Targeted
Project EmCAP (IST-FP6-013123).
References
1.
2.
3.
Bregman, A.S., et al., Effects of time intervals and tone durations on auditory
stream segregation. Percept Psychophys, 2000. 62(3): p. 626-36.
4.
5.
6.
7.
Blake, R. and N.K. Logothetis, Visual competition. Nat Rev Neurosci, 2002.
3(1): p. 13-21.
8.
9.
56
10.
13/06/2008
11.
Denham, S.L. and I. Winkler, The role of predictive models in the formation of
auditory streams. J Physiol Paris, 2006. 100(1-3): p. 154-70.
12.
13.
Tong, F., M. Meng, and R. Blake, Neural bases of binocular rivalry. Trends
Cogn Sci, 2006. 10(11): p. 502-11.
14.
15.
16.
Cowan, N., On short and long auditory stores. Psychol Bull, 1984. 96: p. 351370.
17.
18.
19.
Yabe, H., et al., Organizing sound sequences in the human brain: the interplay
of auditory streaming and temporal integration. Brain Res, 2001. 897(1-2): p.
222-7.
20.
21.
22.
Anstis, S. and S. Saida, Adaptation to auditory streaming of frequencymodulated tones. Journal of Experimental Psychology: Human perception and
performance, 1985. 11(3): p. 257-271.
23.
24.
57
25.
13/06/2008
26.
27.
Cusack, R., et al., Effects of location, frequency region, and time course of
selective attention on auditory scene analysis. J Exp Psychol Hum Percept
Perform, 2004. 30(4): p. 643-56.
28.
29.
Sussman, E., et al., The role of attention in the formation of auditory streams.
Percept Psychophys, 2006. in press.
30.
31.
32.
33.
34.
35.
36.
37.
58
* Suggested Reviewers
Suggested Reviewers:
Joel Snyder joel.snyder@unlv.edu
Daniel Pressnitzer Daniel.Pressnitzer@ens.fr
Raymond van Ee r.vanee@phys.uu.nl