Sei sulla pagina 1di 59

Manuscript

Click here to view linked References

Manuscript for submission to Hearing Research

13/06/2008

Perceptual bi-stability in auditory streaming: How much do stimulus features


matter?
Susan L. Denham1* , Kinga Gyimesi2,3, Gbor Stefanics2,4, and Istvn Winkler2,5
1

Centre for Theoretical and Computational Neuroscience, University of Plymouth,

Drake Circus, Plymouth PL4 8AA, UK


2

Department of General Psychology, Institute for Psychology, Hungarian Academy of

Sciences, 1394 Budapest, P.O. Box 398, Hungary


3

Department of Cognitive Science, Budapest University of Technology and Economics,

1111 Budapest, Sztoczek u. 2, Hungary


4

Department of Experimental Zoology and Neurobiology, University of Pcs, 7624

Pcs, Ifjsg st. 6, Hungary


5

Institute of Psychology, University of Szeged, 6722 Szeged, Petfi S. sgt. 30-34,

Hungary

Abstract
The auditory two-tone streaming paradigm has been used extensively to study
processing mechanisms that underlie the decomposition of the composite auditory input
into coherent sound sequences, and hence the perception of auditory objects. Here we
present new results from a study of bi-stability in auditory streaming. Using relatively
long (4 minute) sequences, we show that there are two fundamentally different phases in

Corresponding author; sdenham@plymouth.ac.uk; tel +44 1752 232610; fax +44 1752

233349

Perceptual bi-stability in auditory streaming

Manuscript for submission to Hearing Research

13/06/2008

this process. Listeners hold their first percept of the sound sequence for a relatively long
period (first phase), after which perception stochastically switches between two or more
alternative sound organisations, each held on average for a much shorter duration
(second phase). The two perceptual phases also differ in that stimulus parameters
influence perceptual behaviour to a far greater degree in the first than in the second
phase, and during the second but not the first phase, there are significant periods when
more than one organisation can be perceived simultaneously. Furthermore, our analysis
reveals deep parallels between the dynamics of perceptual organisation in auditory
streaming and binocular rivalry. We propose an account of auditory streaming in terms
of rivalry between competing temporal associations. Based on the results of our
experiments, we suggest that in the first perceptual phase (formation of associations),
alternative interpretations of the auditory input are formed. In the second phase
(coexistence of interpretations), perception stochastically switches between the
alternatives, thus maintaining perceptual flexibility.

Keywords
auditory streaming, bi-stability, perceptual switching, auditory scene analysis

Introduction
In order to make sense of real-world environments it is necessary to identify,
extract and organise relevant information from the wealth of incoming sensory data.
The potential amount of information far exceeds the processing capacity of any living
system. However, biological organisms are not idle perceivers; rather they seek out
information about the world and the objects in it [1]. The challenge is to create
appropriate object representations on the fly and to continually modify them in order to

Perceptual bi-stability in auditory streaming

Manuscript for submission to Hearing Research

13/06/2008

maintain as accurate models as possible. Thus the process of perceptual organisation is


fundamental to effective perception.
An important problem for auditory perception is that sound sources may emit
discontinuous sequences of sounds, so some means for forming associations between
discrete events, through the detection of regularities and the maintenance of temporally
persistent representations is required. The formation of sequential associations has been
extensively studied by means of the paradigm of auditory streaming. In the typical
streaming experiment [2], a tone sequence of the structure ABA-ABA-ABA- is
presented at a fast stimulus rate (A and B denote tones differing from each other in
frequency; the - sign stands for a silent interval equal to the time interval between the
onsets of successive tones; i.e. the stimulus onset asynchrony (SOA); see figures 1 and
2). When all sounds are grouped together into a single coherent stream, a galloping
rhythm is typically heard. By increasing the frequency separation (f) between the A
and B tones and/or by shortening the interval between subsequent same-frequency tones
(the within-stream inter-tone or offset-to-onset interval, t), perception of the sound
sequence changes to that of two homogeneous isochronous streams; a faster paced one
consisting of A tones and a slower paced one consisting of Bs [2].

SOA
B
A

Sounds

Perception

Figure 1. Cartoon of the auditory streaming paradigm. A sequence of low (A) and high (B)
tones presented repeatedly in ABA- groups can be perceived as a single coherent stream with a
galloping rhythm (upper right), or as two segregated streams (lower right), each with an
isochronous rhythm

Perceptual bi-stability in auditory streaming

Manuscript for submission to Hearing Research

13/06/2008

In general, there is a trade-off between f and t in determining the dominant


perceptual organisation. Asking participants whether they perceived the galloping tone
pattern in sequences of the type shown on Figure 1, van Noorden [2] identified three
separate regions of the f-t (or f-SOA, see Figure 2) 1 space with different
characteristic perceptual organisations (see Figure 3). With very low f's, participants
could always hear the galloping rhythm showing that they could organize all tones into
a single sound stream. With slightly larger f's and/or longer t's, participants were able
to hear either two separate sound streams or a single integrated stream and to alter their
organisational bias between the two percepts at will. Further increasing f and
decreasing t resulted in participants not being able to hear the galloping rhythm, which
suggests that perception of two streams became the dominant sound organisation.
b

a
Frequency

Time
Figure 2. The four different time intervals that can be distinguished in the two tone sequences

used in our streaming experiments, and considered by Bregman and colleagues; a) SOAwithin, b)
ISIwithin, c) ISIacross, and d) SOAacross. Diagram derived from [3].

Based on the results obtained in the classical studies of auditory streaming (for a
review, see [4]), it was assumed that following an initial short build-up period
perception of tone sequences such as the one shown in Figure 1 becomes stable, when

In his experiments, van Noorden did not distinguish between t and SOA [2. van Noorden, L.P.A.S.,
Temporal coherence in the perception of tone sequences, in Institute for Perception research. 1975:
Eindhoven.]; however, recently Bregman and colleagues [3. Bregman, A.S., et al., Effects of time
intervals and tone durations on auditory stream segregation. Percept Psychophys, 2000. 62(3): p. 62636.], in a study designed to investigate the influence of the various possible time intervals on stream
segregation, found that it is in fact the within-stream offset-to-onset interval (i.e. ISIwithin or t) which is
most influential in determining the likelihood of streaming (see, figure 2).

Perceptual bi-stability in auditory streaming

Manuscript for submission to Hearing Research

13/06/2008

parameters fall into either the segregated or the integrated area of the f-t space
(see Figure 3). However, recent evidence [5, 6] suggests that the idea that the auditory
system fixes on a single unchanging dominant percept is to some extent an artefact of
the experimental procedures and analysis methods used in classical studies of auditory
streaming. These obscure an important detail about streaming, namely, that it fluctuates
randomly between the two possible organisations; a phenomenon known as perceptual
bi-stability. 2 Bi-stability in visual perception (for a review, see [7]) has been studied
extensively since it was thought to offer the possibility of identifying the neural
correlates of visual awareness and ultimately consciousness, by allowing perceptual
changes to be dissociated from changes in the stimulus [8]. The finding of perceptual
bi-stability in auditory streaming is important in that it raises questions about the
commonality of mechanisms underlying perceptual organisation in different sensory
modalities and it further suggests that auditory streaming, far from being a primitive and
automatic process, may be better understood in terms of generic strategies of perceptual
organisation; a notion that has important implications for models of auditory streaming.

Actually, even with this simple stimulus configuration, several different sound organisations can be
experienced; i.e. multi-stability. For the sake of simplicity we can assume that the various possible
perceptual organisations can be sorted into two main categories: integrated and segregated (for a formal
definition, see the Design section).

Perceptual bi-stability in auditory streaming

Manuscript for submission to Hearing Research

Segregated

Temporal
coherence
boundary

13/06/2008

Temporal
coherence
boundary

Ambiguous

Fission boundary
Integrated

Fission boundary

Figure 3. The dependence of primitive auditory streaming on frequency difference (f) and
presentation rate (characterized by the SOA) found in human psychophysical experiments using
alternating pure tones [2, 9]. Stimuli in the region of the parameter space above the 'temporal
coherence boundary' are generally perceived as two segregated streams, and those with
parameters in the region below the 'fission boundary' as a single coherent stream. Those falling
in the ambiguous region can be perceived in either way, and perception can be influenced by
top-down processes [2]. van Noorden actually reported two sets of boundaries, which are
illustrated here for comparison and clarification; the more commonly referenced blue set are
more appropriate for describing perceptual behaviour in response to short sequences, while the
green set are relevant to longer sequences such as the ones we report here. The red dot
indicates the stimulus parameters used by Pressnitzer and Hup [5, 6] the orange dot those of
Winkler et al. [10] and the yellow dots those reported in Denham and Winkler [11] see
descriptions of these experiments in the text.

In the investigation of Winkler et al. [10], a single point in the parameter space
(indicated by the orange dot in Figure 3) was tested (A = 1245 Hz, B = 931 Hz, f = 5
semitones, stimulus onset asynchrony, SOA = 100ms, tone duration = 40 ms in one and
175 ms in another condition, sequences were close to 3 minutes in duration).

Perceptual bi-stability in auditory streaming

Manuscript for submission to Hearing Research

13/06/2008

Participants were instructed to continuously depress a response key, when they heard
the galloping rhythm and to release the key when they did not. With the short tone
duration, t was 160 ms, the result was a rather ambiguous perception of the tone
sequence (galloping was heard on average for 61.8% of the time; SD 22.5); with the
long tone duration, t was 25 ms, the result was predominantly segregated perceptual
organisation (10.3% galloping; SD 20.1). Each participant experienced perceptual
switching in both stimulus conditions.
Pressnitzer and Hup [5, 6] tested a similar point of the parameter space
(indicated by the red dot in Figure 3; A = 587 Hz, B = 440 Hz, f = 5 semitones,
stimulus onset asynchrony, SOA = 120 ms, tone duration = 120 ms, t = 120 ms,
sequences of 4 minutes). These parameters are close to the ambiguous condition of
Winkler et al. [10] and although Pressnitzer and Hup [5, 6] used a slightly different
experimental procedure in which participants reported their perceptions using three
buttons; one for grouping (galloping rhythm), one for splitting (streaming) and the third
for cases when neither galloping nor streaming was heard, their participants also
reported considerable perceptual switching.
In the experiment reported in [11], we were interested to test whether bi-stability
was restricted to the ambiguous region or whether it would be found over a wider range
of the parameter space. We used the same one-button experimental procedure as
Winkler et al. [10] and tested the parameter combinations indicated by the yellow dots
in Figure 3. Our results [11] showed that perceptual bi-stability was present for all
conditions tested. Furthermore, although there was great variability in the mean
switching rate between and within participants, they all showed some degree of
perceptual bi-stability.

Perceptual bi-stability in auditory streaming

Manuscript for submission to Hearing Research

13/06/2008

In their experiments Pressnitzer and Hup [5, 6] also explored similarities


between perceptual switching behaviour in response to the auditory galloping/streaming
sequence (Figure 1), and a visual stimulus configuration known to be perceptually bistable. The visual stimulus consisted of overlapping drifting gratings, which could be
perceived either as sticking or sliding surfaces; participants typically experience
perceptual switching between these two alternatives. The results indicated that key
characteristics of visual bi-stability are also present in the auditory case. These
characteristics included: exclusivity, the existence of two plausible yet mutually
exclusive alternative interpretations of the sensory input; randomness, stochastic
switching between percepts such that successive dominance durations are uncorrelated;
and inevitability, the finite duration of perceptual dominance; i.e. even when the
intention is to hold onto one interpretation, a switch will always eventually occur [12].
Visual bi-stability can also be induced by showing a different image to each eye;
the well-known phenomenon of binocular rivalry in which participants experience
switches in perceptual awareness between the two images even though both are
continually present (for a recent review, see [13]). On the face of it, finding similarities
in perceptual behaviour in response to auditory tone sequences and binocularly rivalrous
images seems rather unlikely: the sensory modalities are completely different, the tones
are intermittent, while the images can be constant; and the tones are heard by both ears,
while each eye is presented with a different image. However, binocular rivalry is also
known to display all three of the key characteristics of bi-stability listed above. Because
binocular rivalry has been extensively investigated over many years, and is perhaps the
best understood paradigm inducing bi-stable perception, we explored the possibility that
some insights into the processes underlying auditory streaming might be gained from

Perceptual bi-stability in auditory streaming

Manuscript for submission to Hearing Research

13/06/2008

comparisons between the two phenomena; although we did not necessarily expect to
find a very close correspondence.
Here we report a detailed analysis of perceptual bi-stability observed in auditory
streaming as well as comparisons between auditory streaming and binocular rivalry.
These analyses and comparisons were motivated by our recently suggested
interpretation of the auditory streaming phenomenon [11] (for a detailed description, see
the Distribution of perceptual switching section) and aimed at finding new ways to
describe auditory streaming quantitatively.

Perceptual Experiments
Having found that perceptual bi-stability in auditory streaming exists over a
wide range of the feature space [11], and is not restricted to the ambiguous region [2],
we were interested to characterise the distribution and dynamics of perceptual
switching, since these aspects had not yet been determined for auditory streaming. The
results which we present here were obtained in two further perceptual experiments of
auditory streaming.

Participants
Thirty young healthy volunteers (16 male, 18-26 years of age, average 21.8
years) participated in experiment 1 and 15 (7 male, 21-25 years of age, average 22.2
years) in experiment 2. Participants received modest financial compensation for their
participation. The study was conducted in the sound-attenuated experimental chamber
of the Institute for Psychology, Hungarian Academy of Sciences. It was approved by the
Ethical Committee (institutional review board) of the Institute for Psychology. After the
aims and procedures of the study were explained to them, participants signed an

Perceptual bi-stability in auditory streaming

Manuscript for submission to Hearing Research

13/06/2008

informed consent form before starting the experiment. Participants were pre-selected on
the basis of the results of clinical audiometry with the criteria that the hearing threshold
between 250 and 6000 Hz should not be higher than 25 dB, and the difference between
the two ears not higher than 15 dB in the same frequency range.

Stimulus paradigm
We decided to focus the first experiment on the medium-to-large f region of
the parameter space with medium to moderately long SOAs, to see whether features of
perceptual bi-stability would show differences between the segregated (large f,
medium SOA) and the ambiguous (medium f and large f with long SOA) region of
the parameter space; the second experiment was then focussed on the region of small to
medium f and short to medium SOA. The experimental conditions used are illustrated
in Figure 4 below. Participants were presented with 4-minute long trains of the ABAstructure, where the A and B were pure tones of 75 ms duration, including 5 ms linear
onset and 5 ms linear offset ramps. In separate trains, f was 4, 10, 16, or 22 semitones
(ST) in experiment 1, and 1, 3, 5, or 7 ST in experiment 2; SOA was, 100, 150, 200, or
250 ms (thus t was 125, 225, 325, or 425 ms for the more frequent tones) and 75, 100,
125, or 150 ms (thus t was 75, 125, 175, or 225 ms) for experiments 1 and 2,
respectively. Altogether, 4 4 = 16 different types of trains were tested in each
experiment, separately. The frequency of the lower-pitched, more frequent tones (A
on Figure 1) was kept constant at 400 Hz across the different stimulus conditions.
Sounds were generated on an IBM PC computer (MEL 2.0 stimulus presentation
software Psychology Software Tools Inc.), amplified using a custom-made sound
mixer and amplifier, and delivered through Sennheiser HD 430 headphones at a

Perceptual bi-stability in auditory streaming

10

Manuscript for submission to Hearing Research

13/06/2008

comfortable 70-dB (SPL) intensity level. The order of the stimulus trains with different
parameters (each train 4 minutes long) was randomized separately for each participant.

Procedure
In the classical studies of auditory streaming, participants were typically asked
to report their perception after the end of each short sound sequence. Thus these
experiments were not designed to test the temporal dynamics of streaming-related
perceptual processes. Furthermore, participants were often asked to attempt to hear the
sound sequences according to one or another pattern. This method was used to find
unambiguous effects of stimulus parameters. The recent studies described in the
Introduction employed on-line measures, asking participants to report their perceptions
as they occurred throughout the presentation of relatively long sound sequences.
However, in all but one [6] of the streaming experiments reported in the literature, it has
been implicitly assumed that only two alternative sound organisations are possible; i.e.
either integrated or segregated, each linked to a specific perceived sound pattern. With
sequences similar to those presented in the current experiments (see Figure 1), the
integrated organisation was expected to result in the perception of the galloping
pattern, whereas the segregated sound organisation was expected to result in the
simultaneous perception of a high and a low tone sequence, both with uniform (but
different) presentation rate. These assumptions were reflected in the response choices
and instructions given to participants. In fact, most previous experiments only asked
participants to report when they experienced a certain pattern of sounds (e.g. the
galloping pattern). It was then assumed that participants, who did not report hearing the
designated pattern experienced the opposite sound organisation (in the example, this
would be the segregated organisation). However, in a pilot study in which we asked

Perceptual bi-stability in auditory streaming

11

Manuscript for submission to Hearing Research

13/06/2008

participants to describe their different perceptions in detail, we found that they a) heard
rhythmic patterns different from either one of the expected ones, and b) sometimes
heard simultaneously a pattern that involved both high and low tones and a pattern
involving only high or only low tones.
In order to eliminate possible confusion caused by the perception of rhythms
other than the galloping rhythm the notion of an integrated percept was generalized
and defined for participants as hearing a repeating pattern, which contained both low
and high tones. In turn, the notion of a segregated percept was similarly generalized and
defined for participants as hearing some repeating pattern(s) formed either exclusively
of high or exclusively of low tones, with the possibility that multiple repeating
segregated patterns (i.e., A---A---A and B-B-B) may be perceived concurrently.
Participants were to depress one response key so long as they experienced an integrated
percept and the other key when they experienced a segregated percept. The role of the
two keys was randomly assigned across participants. When participants heard no
repeating tone pattern, they were instructed to release both keys. Participants were asked
to mark their perception throughout the duration of the stimulus sequence and not to
attempt hearing the sound according to one or another perceptual organisation. The
experimenter made sure that participants understood the types of percepts they were
required to report, using both auditory and visual illustrations. Furthermore, in
experiment 1, we then divided our participants into two groups. One group received
instructions implicitly suggesting exclusivity between the segregated and integrated
percepts (as described above); i.e., the instructions were you may either hear a
repeating integrated, or some repeating segregated tone patterns, or no repeating tone
pattern . This set of instructions was similar to that employed by Pressnitzer and
Hup [5]. The other group was explicitly told that it was possible that they may

Perceptual bi-stability in auditory streaming

12

Manuscript for submission to Hearing Research

13/06/2008

sometimes hear both types of patterns at the same time; i.e., the instructions were
you may hear a repeating integrated, or some repeating segregated tone patterns,
possibly even both at the same time, or no repeating pattern at all . In the case that
they heard both types of patterns at the same time, they were instructed to keep both
buttons depressed. However, they were also cautioned to be sure to release the button
when they stopped hearing the corresponding pattern. In addition to the instructions,
when analyzing the responses, we discarded all those responses, which we assumed to
represent transitions between two percepts; i.e., all phases with duration shorter than
300 ms. The assumption was that in such cases, participants may simply have been
slightly inaccurate in synchronising their button presses and releases. Because the
results obtained with the two sets of instructions in experiment 1 proved to be very
similar in all regards except for a higher incidence of reporting two simultaneous
percepts in the latter group, for the sake of clarity, here we only report the results
obtained with the instructions explicitly mentioning the possibility of simultaneously
hearing repeating integrated and segregated tone patterns. The group of participants,
who received this set of instructions, included 15 volunteers (6 male, 18-25 years of
age, average 20.9 years). In experiment 2, we only used this set of instructions and all
other procedures were also identical to those of experiment 1.
Participants sat in a comfortable reclining chair in the experimental chamber
throughout the experimental session, holding a response button in each hand. Short 1-3
minute breaks were inserted between consecutive stimulus trains with longer breaks,
when the participant could move about, scheduled just before the start of the
experimental conditions (after explaining and illustrating the possible perceptions) and
after the 8th stimulus train. Further longer breaks were inserted into the session when
necessary. The experiment took ca. two hours altogether. The state of the two response

Perceptual bi-stability in auditory streaming

13

Manuscript for submission to Hearing Research

13/06/2008

keys was sampled at 10 Hz (100 ms sampling time) by a NeuroScan Synamps EEG


recording system. Response key states were then extracted from the signals and encoded
separately for each participant and condition for further analysis.

Fi
gure 4. Experimental conditions for experiments 1 (magenta) and 2 (cyan) reported here.
Conditions are numbered separately for experiments 1 and 2 in the following way. Number 1
denotes the shortest SOA and smallest f (100 ms / 4 ST and 75 ms / 1 ST for experiments 1
and 2, respectively). Numbers increase faster through the four different fs (e.g., 2 marks
100 ms / 10 ST and 75 ms / 3 ST for experiments 1 and 2, respectively) and slower for the four
different SOAs (e.g., 5 marks 150 ms / 4 ST and 100 ms / 1 ST for experiments 1 and 2,
respectively).

Results and discussion


Firstly, we note that when participants are exposed to alternating tone sequences
of sufficient duration, then perceptual bi-stability is found over a very wide range of the
parameter space, including conditions where stable organisation would be expected; i.e.
conditions with very large frequency differences and fast presentation rates or small

Perceptual bi-stability in auditory streaming

14

Manuscript for submission to Hearing Research

13/06/2008

frequency differences and slow presentation rates. We have found no condition of all
those that we have tested that was stable across all participants, and no participant who
experienced stable perceptual organisation for all conditions. Furthermore, switching
occurs in all phases of the experimental session. The switching results for each
participant, condition, and position of the stimulus train within the experimental session
are illustrated in Figure 5. On average, there were 15.75 switches per condition in
experiment 1, 36.59 in experiment 2. This corresponds, on average, to one switch in
every 15.24 and 6.56 seconds, respectively; showing that perceptual switching occurs
quite often when listening to the tone sequences used to study auditory streaming. We
found no significant effect of the position of the stimulus sequence within the
experimental session for either experiment (F[15,210] = 1.44 and F[15,210] = 0.81, for
experiments 1 and 2, respectively; p>.1, both; one-way dependent ANOVA of the
number of perceptual switches with the factor Train-number [116]). These results
suggest that the observed perceptual switching does not result from learning or fatigue
within the experimental session. The number of switches per train appears to be higher
in experiment 2 than in experiment 1. This may be an effect of the parameters (f and
t; the effects of these parameters will be explored below) and/or a difference between
the participant groups (the amount of switching varies considerably across participants
see the middle column of Figure 5). The following sections examine perceptual
switching in more detail.

Perceptual bi-stability in auditory streaming

15

Manuscript for submission to Hearing Research

13/06/2008

Experiment 1

Experiment 2

Figure 5. Average total number of perceptual switches (red lines) and individual participant data
(black dots) plotted against a) condition, b) participant, and c) position of the 4-minute long
stimulus train within the experimental session. Results from experiment 1 are plotted in the top
row; those from experiment 2 in the bottom row. Condition numbers are defined in Figure 4
separately for experiment 1 and 2. Note that due to the randomized order of the stimulus
conditions, trains with any set of parameters could occur in any position within the experimental
session.

Distribution of perceptual switching


The first question we address is whether perceptual switching occurs uniformly
across the parameter space or whether the degree of perceptual switching is parameter
dependent. One prediction about the distribution of perceptual switching can be made
on the basis of van Noordens perceptual boundaries [2]; the idea being that ambiguity
regarding the appropriate perceptual organisation leads to instability. This interpretation
suggests that perceptual switching will be maximal in the ambiguous region [2].

Perceptual bi-stability in auditory streaming

16

Manuscript for submission to Hearing Research

13/06/2008

Another way to understand bi-stability is as a competition between alternative


interpretations of the auditory scene, mediated by competition between different
sequential associations or rules [14] (for a summary, see Figure 6), where:
Local rules [[4, 14]], refer to links between temporally adjacent events; these rules
are fully supported by temporal proximity [15] (i.e., they are associations between
temporally adjacent sound events in the sequence, thus linking them is relatively
easy, at least at small and intermediate ts; the links are stronger for small f and
weaker for large f (i.e., similar sounds form better groups than dissimilar ones;
see grouping by similarity [15]).
Global rules [4, 14], refer to links between non-adjacent events; in our simple
sequence for streaming, these sounds are identical in frequency (i.e. A-A or B--B..); therefore, temporal separation alone governs the strength of this type of
grouping, which is stronger for small t and weaker for large t. (Note that in our
experiments, decreasing the SOA by a given amount decreases t by twice that
amount, because tone duration was kept constant.)
Global rulet

Local rule... f (t)

Figure 6. Cartoon indicating the influence of f and t on the most prominent sequential
associations which can be made in the galloping auditory streaming paradigm. t is placed in
parenthesis, because with short-medium ts (SOAs), the rule-competition account of auditory
streaming suggests that the effect of changes in SOA (which determines t in the current
experiments) on the formation and representation of the local rule is relatively small. This is
because 1) the sounds to be connected are adjacent and 2) with relatively short SOAs
separating the sounds, the neural after-effects of the first sound are still present when the second
sound arrives.

The local versus global rule interpretation suggests that most switching will be
found for stimulus parameters which maximise competition, i.e. when both local and
global rules are strong. This occurs for small f and small t, because with small f, the

Perceptual bi-stability in auditory streaming

17

Manuscript for submission to Hearing Research

13/06/2008

local rule becomes strong, and with small t, the global rule becomes strong. Hence, if
this interpretation is correct, we should find most switching in experiment 1 at f =
4ST, SOA = 100ms. Conversely least competition should occur when the two rules are
not well balanced, and one or the other wins the competition very easily. In experiment
1 we expect to find least switching for the 4ST, 250ms and 22ST, 100ms conditions.
The conditions where the two rules are approximately equally matched but are both
relatively weak would result an intermediate amount of switching.
In order to distinguish these alternative interpretations (i.e., ambiguity vs.
rule-competition) we examined the total switching in each condition. The results below
support the hypothesis of competing sequential associations; the mean switching rate
peaks along a ridge where local and global rules may be considered to be roughly
balanced, and in experiment 1, it is highest for the condition with smallest f and
smallest t (see Figure 7, left panel). From this analysis, it is clear that the region of
maximum switching does not coincide with the ambiguous region of van Noorden [2].
We note that the ridge of maximum switching appears to run roughly parallel to the
temporal coherence boundary.

Figure 7. Group-mean distribution of perceptual switching in experiment 1 (left panel) and

Perceptual bi-stability in auditory streaming

18

Manuscript for submission to Hearing Research

13/06/2008

experiment 2 (right panel). The colour scale indicates the mean number of switches across
participants accumulated throughout the tone trains, separately for each condition. Note that the
coloured surface is interpolated between the discrete experimental data points indicated by the
green dots. For clarity, the x and y axes of the two panels are differently scaled. See figure 4 for
the relation between the parameters used in experiments 1 and 2.

The results of experiment 1 (Figure 7, left panel) suggest that the distribution of
perceptual switching in auditory streaming is broadly consistent with competition
between alternative sequential associations, with stronger competition and more
switching where these associations are strongest. However, although the relationship
between perceptual switching and stimulus parameters generally supports the rule
competition hypothesis, there is a clear qualification evident in the results of
experiment 2 (Figure 7, right panel); i.e. switching rates do not increase indefinitely
with decreasing f and t. There is a non-monotonic relationship between the mean
number of switches and f and t, with the maximum switching found in the region of
f = 4ST and SOA = 125ms (t = 175 ms). Although the increase in switching with
increasing rule strength is intuitively easy to understand, the non-monotonic relationship
between rule strength and switching requires further consideration.
The fall-off in switching rate in the region of very small f and t suggests that
different factors, which become stronger in this region of the parameter space, also
affect switching. Due to the uniform 75-ms stimulus duration used in our experiments,
with SOA < 100ms there is no (or almost no) silent gap between successive A and B
tones. Thus it is possible that for small fs, triplets of three successive tones (ABA)
may form a unitary event and, therefore, for segregation to occur, the system has to first
extract the components from the composite before other sequential associations can be
established. This notion is supported by the literature on temporal integration, showing
that auditory input within 150-200 ms is integrated into a single unit and processed in

Perceptual bi-stability in auditory streaming

19

Manuscript for submission to Hearing Research

13/06/2008

many ways differently from successive sounds exceeding this period (e.g., masking,
loudness summation, detection of omissions and successive deviations [16-18]). Thus in
the case of short ts, building the global rule suffers and, as a consequence,
competition between the two rules is less balanced. Therefore, the amount of switching
decreases compared with the more balanced cases.
In contrast, with large fs and short SOAs (SOA < 100ms; t < 125 ms),
associations between successive identical tones may be formed directly and, perhaps,
even before associations between the tones with different frequencies. If this were the
case, one should expect that, contrary to the common assumption that integration is
always the first perceptual state [4], segregation should be reported first for short SOA
and large f conditions. Our results (see next section) confirm this expectation. One
possible explanation for the immediate dominance of segregation at small t and large
f is that large fs impose relatively large spatial distance between stimulus-driven
neural activity in the tonotopically organized part of the afferent auditory system, thus
weakening and delaying interactions between the activities associated with the two
different tones. At the same time, short ts allow the after-effect of the more frequent
tones to survive till the arrival of the next identical tone. This may enable easy direct
linking of successive identical tones. In accordance with this account, dominance of
segregation over temporal integration at large fs has been previously demonstrated
[19-21], even within the temporal integration period.

Differences between switching behaviour in the first and subsequent phases


It is widely assumed that auditory streaming has a strong initial bias towards
integration. This has led to the suggestion that auditory streaming can be interpreted as a
process whereby the auditory system 'accumulates evidence' before making a final
Perceptual bi-stability in auditory streaming

20

Manuscript for submission to Hearing Research

13/06/2008

organisational decision [4]. That is, segregation can only emerge after a gradual 'buildup' process [22]. In this section we show that 1) it is not always the case that
participants report integration first and 2) there is no stable final sound organisation,
rather switching between alternative organisations continues throughout the stimulation.
Figure 8 (top panels) show, separately for experiment 1 and 2, the mean first
percept (termed first phase) durations averaged across all participants. The groupaveraged value was calculated by treating integrated phase durations as positive and
segregated phase durations as negative. Therefore, the colour in the diagram shows the
overall group tendency towards one or the other first reported percept. It is clear from
the figure that for small fs, integration tends to be the first percept reported, whereas
for trains with parameters falling into the larger-f and short-SOA region, most
participants initially linked same-frequency tones (A-A or B---B), and thus perceived a
segregated percept first (for a possible explanation, see the previous section).

Perceptual bi-stability in auditory streaming

21

Manuscript for submission to Hearing Research

13/06/2008

First Phase

Mean of All Subsequent Phases

Figure 8. Group-mean signed durations (in seconds) of the first (top) and the mean of all
subsequent phases (bottom) averaged across all participants for experiment 1 (left) and
experiment 2 (right). Segregated phases were assigned negative and integrated phases positive
values. Note that the colour scale is the same for all images as indicated by the bar on the right,
and that the coloured surface is interpolated between the discrete experimental data points
indicated by the green dots. For clarity, the x and y axes of the two panels are differently scaled.
See figure 4 for the relation between the parameters used in experiments 1 and 2.

A comparison between the distribution of the signed durations obtained in the


first and subsequent phases (figure 8 bottom panels) clearly shows many instances of
first-phase segregation and also that, as expected, segregation dominates over a wider
range of stimulus values after the first phase. For experiment 1, an ANOVA test of the
phase durations with the structure Phase (first vs. subsequent) SOA (4 levels, see
Design and figure 4) f (4 levels, see Design and figure 4) showed significant main
effects of all three factors (Phase: F[1,14]=15.04, p<.001, effect size 2=0.52; SOA:

Perceptual bi-stability in auditory streaming

22

Manuscript for submission to Hearing Research

13/06/2008

F[3,42]=30.11, p<.0001, Greenhouse-Geisser =0.73, 2=0.68; f: F[3,42]=22.35,


p<.0001, =0.60, 2=0.61). The first phase had largely positive values, which shows the
initial relative dominance of integration, whereas overall, subsequent phases were
closely balanced (the average being close to zero). However, comparisons between
these phase durations values are misleading because we averaged over signed values;
hence we show the results of a comparison based on unsigned phase duration values
later in this section. As expected, the shortest SOA (100 ms) promoted segregation
(negative signed duration value), whereas integration (positive signed duration values)
was dominant with the longest SOAs (200 and 250 ms) with a monotonic increase with
SOA. The opposite progression was seen with f: clear integration with the smallest f
(4ST) with a monotonic decrease with increasing f, and segregation becoming
dominant at the largest fs (16 and 22ST). Both SOA and f significantly interacted
with Phase (Phase SOA: F[3,42]=9.44, p<.0001, =0.57, 2=0.40; Phase f:
F[3,42]=9.09, p<.0001, =0.90, 2=0.39). The interaction between Phase and SOA was
caused by the SOA effect (more integration with longer SOAs) being only present in
the first phase, whereas SOA had no significant effect on the balance between
integration and segregation (close to zero mean signed phase durations, see also Figure
8 bottom left) in subsequent phases (Tukey HSD post-hoc test with df =42, p<.05 at
least for comparisons between signed phase durations for first-phase, 200- and 250-ms
SOA and that for all other cells). The interaction between Phase and f was caused by
the f effect (more integration with smaller fs) being only present in the first phase,
whereas f had no significant effect on the close balance between integration and
segregation in subsequent phases (Tukey HSD post-hoc test with df =42, p<.05 at least
for comparisons between first-phase, 5- and 10-ST f and all other cells). No other
interactions yielded significant results.
Perceptual bi-stability in auditory streaming

23

Manuscript for submission to Hearing Research

13/06/2008

The ANOVA of the signed phase durations for experiment 2, showed a very
similar set of results. All three main effects were significant (Phase: F[1,12]=18.64,
p<.001, 2=0.62; SOA: F[3,42]=12.04, p<.001, =0.60, 2=0.46; f: F[3,42]=34.01,
p<.0001, =0.50, 2=0.71). The first phase showed more integration, whereas
subsequent phases were balanced (again, absolute phase duration will be analyzed
below). Segregation dominated with the shortest SOA (75 ms) and increased
monotonically with SOA, whereas integration dominated with the smallest f (1ST),
monotonically decreasing with increasing fs. Both SOA and f significantly
interacted with Phase (Phase SOA: F[3,42]=5.26, p<.01, =0.47, 2=0.27; Phase f:
F[3,42]=13.99, p<.0001, =0.66, 2=0.50). The interaction between Phase and SOA was
again caused by SOA only affecting signed phase durations in the first phase in contrast
to the balance between integration and segregation found in subsequent phases (Tukey
HSD post-hoc test with df =42, p<.05 at least for comparisons between signed phase
durations in first-phase, 100-150-ms SOA and that in all other cells; see also Figure 8
bottom right). The interaction between Phase and f was again caused by the f only
affecting signed phase durations in the first phase in contrast to the balance between
integration and segregation found in subsequent phases (Tukey HSD post-hoc test with
df =42, p<.05 at least for comparisons between first-phase 1- and 3-ST f and all other
cells). No other interactions yielded significant results.
In summary, whereas SOA and f had the expected effects on the initial (first
phase) percept (short SOAs promoting segregation and small fs integration), neither
appeared to affect the ratio between integration and segregation in subsequent phases,
which were quite balanced (i.e., integrated and segregated organisations being perceived
overall for ca. equal durations within the stimulus trains). Thus, although an initial

Perceptual bi-stability in auditory streaming

24

Manuscript for submission to Hearing Research

13/06/2008

integrated percept is more common over a wider range of parameters, segregation


catches up with it later on.
Figure 8 and the related statistical analysis of the signed phase durations clearly
show differences in the distribution of integration and segregation and the effects of
SOA and f between the first and subsequent perceptual phases. However, as was
noted, averaging signed phase durations does not allow one to draw conclusions
regarding the length of phase durations. These are illustrated in Figure 9 below,
separately for the first and subsequent phases.
First Phase

Mean of All Subsequent Phases

Figure 9. Group-mean durations (in seconds) of the first (top) and the mean of all subsequent
phases (bottom) averaged across all participants for experiment 1 (left) and experiment 2 (right).
Note that the colour scale is the same for all images as indicated by the bar on the right, and that
the coloured surface is interpolated between the discrete experimental data points indicated by
the green dots. For clarity, the x and y axes of the two panels are differently scaled. See figure 4
for the relation between the parameters used in experiments 1 and 2.

Perceptual bi-stability in auditory streaming

25

Manuscript for submission to Hearing Research

13/06/2008

The images show 1) a clear overall decrease of durations from the first to
subsequent phases, as well as visible effects of t and f on phase durations in the first
but not on subsequent phases.
In experiment 1, an ANOVA test of the absolute phase durations (segregated
and integrated phases pooled together) with factors of Phase (first vs subsequent)
SOA (4 levels, see Design and figure 4) f (4 levels, see Design and figure 4) showed
a main effect of Phase (F[1,14]=46.11, p<.0001, 2=0.77) and f (F[3,42]=3.14, p<.05,
=0.83, 2=0.18). The Phase main effect was caused by longer first- than
subsequent-phase durations. Phase durations were longer for 4-ST than for 10-ST f.
The significant interaction between Phase and f (F[3,42]=4.25, p<.05, =0.86,
2=0.23) was caused by 1) longer phase durations induced by the smallest f (4ST) in
the first phase than by any other combination of Phase and f, except the largest (22ST)
f in the first phase and 2) longer phase durations by the largest (22ST) f in the first
phase than any subsequent-phase duration (Tukey HSD post-hoc test with df =42, p<.05
at least, for comparisons between first-phase 4-ST f and all other cells, except for
first-phase 22-ST f and between first-phase 22-ST f durations and phase durations in
the second phase). There was also a significant interaction between the SOA and f
factors (F[9,126]=5.45, p<.001, =0.59, 2=0.28), which stemmed from the opposite
tendency of the SOA effect at low and high fs: at low fs, phase durations increased
with increasing SOAs, whereas at high fs, they decreased with increasing SOAs
(Tukey HSD post-hoc test with df =126, p<.05 at least, for comparisons between 10ST
f, 150-ms SOA and 16 or 22ST, 100 ms as well as 4ST differing from 16 and 22ST at
250 ms). Finally the significant triple interaction (F[9,126]=3.39, p<.01, =0.57,
2=0.20) revealed that the above described SOA f interaction in determining phase

Perceptual bi-stability in auditory streaming

26

Manuscript for submission to Hearing Research

13/06/2008

durations only characterized the first phase, whereas subsequent phases showed a
largely uniform distribution of phase durations. No other main effect or interaction
reached significance.
In experiment 2, an ANOVA of the same structure as above yielded significant
main effects for Phase (F[1,14]=21.34, p<.001, 2=0.60) and f (F[3,42]=18.55,
p<.0001, =0.71, 2=0.57). Similarly to experiment 1, the effect of Phase was explained
by the first phase being significantly longer than the subsequent phases. Phase durations
monotonically decreased with increasing fs. The interaction between the Phase and
SOA factors (F[3,42]=5.58, p<.01, =0.81, 2=0.28), was the product of increasing
phase durations with increasing SOAs in the first phase, only (Tukey HSD with df=42,
p<.05 at least, between any pair of first- and second-phase cells at 100-150-ms SOA).
The interaction between the Phase and f factors (F[3,42]=9.88, p<.001, =0.86,
2=0.41) stemmed from the first-phase duration at the smallest f (1ST) being
significantly longer than all other phase durations, including all other first-phase
durations (Tukey HSD with df=42, p<.001 in all cases). This result revealed that longer
first-phase durations mainly occur at low fs (qualifying the Phase main effect).
Finally, the interaction between the SOA and f factors (F[9,126]=5.07, p<.01, =0.47,
2=0.27) is resolved by the finding that, except for the shortest SOA (75 ms), the lowest
f (1ST) induced longer phase durations than any of the larger fs (Tukey HSD with
df=126, p<.001 between 1-ST f with 100-, 125-, or 150-ms SOA and any other
combination of f and SOA). No other main effect or interaction reached significance.
In summary, the pattern of phase durations showed that first phases are usually
longer than subsequent ones. It is also clear from the signed phase duration results that
integration is not always the first percept. Furthermore, stimulus parameters appear to

Perceptual bi-stability in auditory streaming

27

Manuscript for submission to Hearing Research

13/06/2008

affect the duration of the first phase only (see the almost homogeneous distribution of
phase durations in the second phase Figure 9 bottom). The analysis of the signed
phase durations showed that small fs promote integration as the first percept, whereas
short ts promote segregation as the first percept. This result together with the pattern
of interaction between Phase and f and/or SOA for absolute phase durations suggest
that first-phase durations are shortest when competition between the two rules is the
highest (sort SOA, small f). More or less uniformly distributed phase durations in
subsequent phases leads to more overall switching with shorter first-phase durations,
because a short first phase leaves more time for switching in subsequent phases. This
explains the pattern observed in the Distribution of perceptual switching section and
qualifies the effects and explanations described in that section as referring to the first
phase: When both rules are strong they can be discovered fast and so switching between
them starts early.

Time course of perception during the stimulus trains


Our finding that segregation becomes more likely later during the stimulus trains
appears to match the classical findings of a build-up of segregation followed by a stable
percept. However, we have already noted that segregation can be the initial percept.
Furthermore, figures 10 and 11 show that the time course of increasing probability of
perceptual segregation is not identical to the one described as build-up in the classical
literature of auditory streaming. This is because the duration of the build-up phase has
been estimated to be in the order of ca. 10 s, whereas for most parts of the parameter
space, the duration of the first percept exceeds this period. Furthermore, the probability
of perceiving integrated or segregated organisations does not settle immediately after
the first phase. Rather, slow changes can be observed, throughout the whole 4-minute

Perceptual bi-stability in auditory streaming

28

Manuscript for submission to Hearing Research

13/06/2008

stimulus train, depending on the combination of parameters. Three different trends can
be discerned in the figures below. With parameter combinations regarded to promote
segregation (short SOA, large f), following a fast overshoot of the probability of
segregation, the ratio between segregation and integration declines slowly (see, e.g.,
figure 10, SOA=100 ms, f=16 or 22ST and figure 11, SOA=75 ms, f=5 or 7ST).
With parameter combinations regarded to promote integration, the probability of
segregation appears to increase slowly throughout the whole duration of the stimulus
trains (see, e.g., figure 10, SOA=150 or 200 ms, f=4ST and figure 11, f=1ST and any
SOA or SOA=150 ms and any f). Finally, with most of those parameter combinations
which would fall into the ambiguous region, the initial increase of segregation is
followed by a fairly stable period, in which the probability of segregation falls mostly
into a narrow range between 0.4 and 0.6 (see, e.g., figure 10, SOA=200 ms, f=10ST
and figure 11, SOA=125 ms, f=5 or 7ST).
We tested the time course of perceptual organisation within the 4-minute long
stimulus trains by comparing across three time ranges, selected from the early (20-50 s,
because there are some combinations of parameters and participants, who had not yet
given their initial response at shorter latencies; see the range shade blue-grey on Figure
10), middle (120-150 s) and late (200-230 s) phase of the stimulus trains by conducting
ANOVAs of the probability of segregation with the structure: Time-range (early,
middle, late) SOA (4 levels; see Figure 4) f (4 levels; see Figure 4). In experiment
1, increasing SOAs induced a monotonically decreasing probability for perceiving the
segregated organisation (F[3,42]=30.24, p<.0001, =0.89, 2=0.68), whereas increasing
fs induced a monotonically increasing probability for perceiving the segregated
organisation (F[3,42]=45.87, p<.0001, =0.78, 2=0.77). These results are fully
compatible with the known effects of SOA and f on perception of these typical
Perceptual bi-stability in auditory streaming

29

Manuscript for submission to Hearing Research

13/06/2008

streaming test sequences. The significant interaction between Time-range and SOA
(F[6,84]=3.65, p<.001, =0.74, 2=0.21) reflects a gradual decrease in the steepness of
the SOA effect (decreasing segregation with increasing SOAs) at later time ranges. No
other main effect or interaction reached significance. In experiment 2, similarly to
experiment 1, increasing SOAs induced a monotonically decreasing probability for
perceiving the segregated organisation (F[3,42]=19.24, p<.0001, =0.70, 2=0.58),
whereas increasing fs induced a monotonically increasing probability for perceiving
the segregated organisation (F[3,42]=74.14, p<.0001, =0.95, 2=0.84). The Time-range
SOA (F[6,84]=3.26, p<.05, =0.59, 2=0.19) and Time-range f (F[6,84]=3.54,
p<.05, =0.68, 2=0.20) interactions reflect a gradual decrease in the steepness of the
related main effects (decreasing segregation with increasing SOAs and increasing
segregation with increasing fs) at later time ranges. No other main effect or
interaction reached significance.
These results suggest that the well-known effects of SOA and f on the
perceptual organisation of the stimulus trains are prominent at the beginning of the
stimulus sequences and they diminish with time. This conclusion further supports our
observations regarding the different properties of the first and subsequent phases.

Perceptual bi-stability in auditory streaming

30

Manuscript for submission to Hearing Research

13/06/2008

Figure 10. Experiment 1: Group-average time-course of the probability of the segregated


percept within the stimulus trains as a function of SOA (panels) and f (overplotted, marked
with colours). With segregation taken as the value 1 and integration as the value 0, the figure
shows the probability of reporting segregation at each time point. Note that the blue-grey bars
indicate the initial period during which not all participants had yet made their initial choice; the
data during this time is, therefore averaged over only those participants who had reacted by the
given time. The violet bars mark the periods used for the statistical analysis.

Perceptual bi-stability in auditory streaming

31

Manuscript for submission to Hearing Research

13/06/2008

Figure 11. Experiment 2: Group-average time-course of the probability of the segregated


percept within the stimulus trains as a function of SOA (panels) and f (overplotted, marked
with colours). See description in the legend of figure 10.

The fact that switching continues throughout the duration of the whole stimulus
train, is clearly shown in figures 12 and 13 in which group-averaged phase durations are
plotted for each condition as a function of time within the stimulus trains. The low
phase-duration period at the beginning of the stimulus trains (<20 s) reflects the fact that
participants with a fast initial report usually switch their perception after a short period
of time (the blue-grey-shaded column at the beginning of each plot shows the time
when not all participants have yet given their response, separately for each SOA).
Overall, phase durations are quite stable with a decrease towards the end of the stimulus
trains. We assessed the time course of mean phase durations by comparing across the
three time ranges used in our test of the time course of segregation/integration (see
above; early: 20-50 s; middle: 120-150 s; and late: 200-230 s time ranges in the stimulus

Perceptual bi-stability in auditory streaming

32

Manuscript for submission to Hearing Research

13/06/2008

trains). Only significant effects related to the time ranges are reported, because we have
already tested the effects of SOA and f on phase durations (see the previous section).
In experiment 1, phase durations were significantly reduced in the late as compared to
the early and middle time ranges (F[2,28]=5.85, p<.05, =0.80, 2=0.29). The
Time-range and f factors showed a significant interaction (F[6,84]=2.68, p<.05,
=0.69, 2=0.16), which was mainly caused by the low fs (4, 10ST) in the late range
producing shorter phases than most other combination of the f and the Time-range
factor (Tukey HSD post-hoc test with df =84, p<.05 for comparisons between the
late-range 4 and 10ST phase durations and all but the 10-ST phase durations in the other
two time ranges). In experiment 2, phase durations were again significantly reduced in
the late as compared to the early and middle time ranges (F[2,28]=10.23, p<.01, =0.76,
2=0.42). The Time-range and f factors also showed a significant interaction
(F[6,84]=4.66, p<.01, =0.51, 2=0.25), which was caused by the lowest f (1ST)
producing longer phases in the early and middle range than any other combination of
the Time-range and f factors (Tukey HSD post-hoc test with df =84, p<.001 for
comparisons between the 1-ST early- and middle-range phase durations and any other
combination of Time-range and f, including the 1-ST late-range phase duration).
It is not clear, whether the decrease of phase durations at the end of the stimulus
trains indicates that four minutes is an important time scale in stream segregation, or
that participants acquired a sense of the length of the stimulus blocks and these changes
reflect their expectation of the termination of the train. It is also possible that this result
is simply an artefact of the end of the stimulus block (which is 10 seconds after the end
of the late time range used) cutting short some of the longest phases falling into this
time range and thus distorting the full distribution of phase durations. Using longer
stimulus blocks in follow-up experiments may shed light on this issue. The interactions
Perceptual bi-stability in auditory streaming

33

Manuscript for submission to Hearing Research

13/06/2008

between Time-range and f reflect this effect combined with the previously observed
effect showing that low fs induce long (integrated) first phases.

Figure 12. Experiment 1: Time course of group-average phase duration separately for the four
different SOAs, overplotting the different fs (marked with differently coloured lines). Note
that the blue-grey bars indicate the initial period during which not all participants had yet made
their initial choice; the data during this time is therefore averaged over only those participants,
who had reacted by the given time. The violet bars mark the periods used for the statistical
analysis.

Perceptual bi-stability in auditory streaming

34

Manuscript for submission to Hearing Research

13/06/2008

Fi
gure 13. Experiment 2: Time course of group-average phase duration separately for the four
different SOAs, overplotting the different fs. See description in the legend of Figure 12.

In summary, we found that for the typical tone sequences used to study auditory
streaming, 1) integration is not necessarily the first percept and 2) no stable final percept
emerges. These statements appear to contradict the classical findings. The
contradictions are, however, explained by the different assumptions and methods
employed by the present study (together with the few similar previous studies [5, 6, 10,
11]) and classical explorations of auditory streaming. The assumption of a stable final
percept led most experimenters in the past to use short (typically <20 s) trains and ask
participants about their final percept. Looking at the cross section of figures 10 and 11
at ca. 15 s, we find that our results closely match those of e.g., van Noorden [2].
Furthermore, the initial 20 s of the curves shown on these figures indeed give the
impression of a fast but gradual build-up of streaming. The difference between the
current and the classical view is that, whereas this build-up has been usually

Perceptual bi-stability in auditory streaming

35

Manuscript for submission to Hearing Research

13/06/2008

interpreted as more and more participants reaching the final percept, our data shows
that the group-average probability of perceiving the sounds in terms of one or another
percept is a product of averaging between perceptual states, which switch back and forth
all the time, but with shorter- and longer-term changes in the statistics of the switching
behaviour. In fact, our results argue for a different distinction in the temporal dynamics
of auditory streaming. Instead of build-up and final percept it appears that
distinguishing between the first perception (first-phase) and subsequent states, as is
often the case in the analysis of visual bi-stability, may prove a more fruitful description
of the temporal behaviour of perceptual processes. Results described in this section
argue that these two periods (and perhaps more, if changes by the end of the 4-minute
trains are not related to the length of the stimulus blocks) may show characteristically
different properties and thus may be understood by different theoretical and
computational models.
From the above views it follows that an improved description of auditory
streaming must cover the dynamic properties of this perceptual phenomenon. For
further insight we next turned to study similarities and differences between auditory
streaming and visual bi-stability. The following section will test whether auditory
streaming also shows some important properties of binocular rivalry. The findings will
be incorporated within our rule-competition theory, which already provides
explanations to many of the classical (e.g., integration is more often the first percept,
because building local rules is faster; etc.) and some of the novel findings (switching
maxima; segregation as the first percept; etc).

Perceptual bi-stability in auditory streaming

36

Manuscript for submission to Hearing Research

13/06/2008

Effect of rule strength on perceptual organisation


Visual studies of bi-stability often employ the paradigm of binocular rivalry, in
which different images are presented to the left and right eye. Participants experience
perceptual switching between the two images, as one then the other dominates
perception for some period of time (termed the dominance duration). The well-known
second proposition of Levelt [23] states that if the contrast in the image presented to one
eye is increased while the contrast in the image to the other eye is held constant, then
the dominance durations of the fixed-contrast eye decrease while the dominance
durations of the variable-contrast eye remain approximately constant. This law was
thought to hold generally for binocular rivalry. However, it has recently been shown
that if a wider range of mean contrast levels is used (Levelt employed only relatively
high contrast levels), then adjusting the contrast in one eye can affect the dominance
durations both eyes, as illustrated in figure 14 [24].

Perceptual bi-stability in auditory streaming

37

Manuscript for submission to Hearing Research

13/06/2008

Figure 14. Cartoon illustrating changes in dominance durations with changing image contrast in
one eye: the validity of Levelt's second proposition [23]; taken from [24]. On the left, a series of
four plots shows how the dominance durations of the variable (solid line) and fixed (dashed
line) contrast eyes changes as a function of contrast in the variable-contrast eye. The y axis
indicates the normalised dominance duration, and the x axis, the log of the contrast in the
variable contrast eye. As shown, relationships such as that in the upper left plot are consistent
with Levelts proposition [23], whereas those in the lower left plot, are inconsistent; there
appears to be a continuum from one extreme to the other (illustrated in the middle two plots).
The plots are colour coded to map onto the box at the lower right, which indicates the absolute
contrast levels under which these different relationships were observed. As can be seen, Levelts
second proposition holds when the contrast level in the fixed eye is high, but not when the
contrast level in the fixed eye is low.

If bi-stability in auditory streaming originates from competition between


competing rules, then it is possible that the role of contrast in binocular rivalry may be
Perceptual bi-stability in auditory streaming

38

Manuscript for submission to Hearing Research

13/06/2008

similar to that of the strength of competing rules in auditory streaming. That is, higher
contrast in binocular rivalry would be analogous to having a stronger rule for a
particular organisation in auditory streaming. We have already suggested that f and t
affect local and global rules differently: f affects only the strength of the local rule,
whereas t only affects the global rule. However, although SOA also has a small effect
on the local rule (the across-stream t decreases with decreasing SOAs, which
somewhat strengthens the local rule), we will ignore this factor in the following
description. As a consequence, by changing f while keeping the SOA constant we
manipulate the strength of the local rule while keeping the strength of the global rule
fixed; and, similarly, changing SOA while keeping f constant manipulates the strength
of the global rule while keeping the strength of the local rule (more or less) fixed. As a
first step, for comparison between binocular rivalry and auditory streaming, figure 15
was constructed to show the group-averaged durations of the first two perceptual phases
in experiment 1. 3 We chose to analyse the mean duration of the first two perceptual
phases, because our previous analyses (see above) showed that f and SOA has little
effect on phase durations in later time ranges of the 4-minute trains. Mean integrated
phase durations are plotted with respect to the stimulus parameters as an interpolated
solid surface, and mean segregated phase durations are plotted as a meshed surface. By
taking slices through these surfaces at different values of f and SOA, we can compare
the effects of rule strength in auditory streaming with the effects of contrast levels in
vision.

Because experiment 2 covered a much smaller range of parameters in f and SOA than experiment 1,
only results of experiment 1 were used in this analysis.

Perceptual bi-stability in auditory streaming

39

Manuscript for submission to Hearing Research

13/06/2008

Figure 15. Interpolated surfaces showing the group-averaged durations of the first two
perceptual phases obtained in experiment 1 as a function of f and SOA; integrated (solid
surface) and segregated (mesh surface) phases. Colours code phase durations (redundant with
the vertical axis).

Figure 16 shows that, if we interpret auditory streaming as a competition


between alternative sequential associations (local versus global rules), then we can find
similar relationships between the dominance durations of integration and segregation to
those reported for eye dominance in binocular rivalry [24]. Levelts second proposition
is consistent with the changes in phase durations where the associations are strong. For
instance, at 4-ST f (strong local rule), increasing the global-rule strength by decreasing
the SOA reduces the mean integrated phase durations (the fixed-rule-strength
percept),

but

has

little

effect

on

the

segregated

phase

durations

(the

variable-rule-strength percept; compare the top left panel of Figure 16 with panel A
of figure 14). In contrast, at 22-ST f (weak local rule), increasing the global-rule
strength, by decreasing the SOA substantially increases the mean segregated phase
Perceptual bi-stability in auditory streaming

40

Manuscript for submission to Hearing Research

13/06/2008

durations (variable-rule-strength percept) in accordance with the similar binocular


rivalry effect (compare bottom left panel of Figure 16 with panel D of figure 14). At
fs between these two extremes, results compatible with the intermediate relationships
reported by Brascamp et al. are found [24].
Similarly, for the 100-ms SOA conditions (strong global rule), increasing the
local-rule strength by decreasing f decreases the mean segregated phase durations (the
fixed-rule-strength percept), but does not much affect the mean integrated phase
durations (the variable-rule-strength percept; compare the top right panel of Figure 16
with panel A of figure 14). In contrast, at 250-ms SOA (weak global rule), increasing
the local-rule strength by decreasing f substantially increases the mean integrated
phase durations (variable-rule percept) in accordance with the similar binocular
rivalry effect (compare bottom right panel of Figure 16 with panel D of figure 14).

Figure 16. Local- and global-rule-strength effects, colour-coded as in Figure 14: analysed for f
(local rule; left column) and t (global rule; right column). Phase durations for the
Perceptual bi-stability in auditory streaming

41

Manuscript for submission to Hearing Research

13/06/2008

fixed-strength rule are shown with dashed, those for the variable-strength rule with
continuous lines. Note that the rule strength progressively decreases from the top row to the
bottom row, and within each plot, rule strength decreases from left to right; i.e. the opposite
direction than in figure 14.

Thus another analogy can be shown between visual bi-stability and auditory
streaming. Specifically, assuming an analogy between the strength of local- vs.
global-rule representations in auditory streaming and contrast levels of images presented
to the two eyes in the binocular rivalry situation, parameters affecting the "strength" of
these representations had similar effects on the dominance phase durations. Together
with the similarities mentioned in the introduction, this analogy suggests that the
principles of computational models of binocular rivalry may be applicable in modelling
auditory streaming. However, there is a caveat to this claim. Here we considered only
the first two perceptual phases, since we showed earlier that stimulus parameters only
have significant effects on phase durations during the first perceptual phase. In contrast,
the findings in binocular rivalry [24] may relate to later phases of perceptual switching,
as Brascamp et al excluded the first minute of participant responses from their analysis.
Thus the analogy between auditory streaming and binocular rivalry may not extend to
the distinction between initial and subsequent phases in auditory streaming. This,
however, does not reduce the value of the analogy in helping to identify generic
modelling principles applicable to both modalities.

Transition Phases: Distribution of both-percept responses


The distribution of both-percept responses (i.e., when participants report
perceiving repeating integrated and segregated percepts concurrently) is not uniform
across conditions in experiment 1. (Again, we only analyze the results of experiment 1
see Footnote 3.) There is a clear tendency to report 'both' more often along a ridge in the

Perceptual bi-stability in auditory streaming

42

Manuscript for submission to Hearing Research

13/06/2008

parameter space roughly corresponding to the region where local and global rules are
balanced. An ANOVA of the proportion of both responses in the subsequent phases
with the structure of SOA (4 levels; see Figure 4) f (4 levels; see Figure 4) showed
only a significant interaction between the two factors (F[9,126]=3.06, p<0.05, =0.59,
2=0.18), which was caused by opposite effects of f at different SOAs: At the shortest
SOA, increasing f resulted in decreasing proportion of both responses, whereas at
the longest SOA, increasing f increased the proportion of both responses (Tukey
HSD post-hoc test with df =126, p<.05 between the proportion of both responses at
shortest-SOA (100-ms) and largest f (22ST) and that with the two shortest SOAs (100
and 150 ms) and the smallest f (4ST)). Thus segregated and integrated sound
organizations are more often perceived simultaneously, when the rules are of
approximately equal strength: When both are either strong or weak at the same time.
Note that the distribution of both-percept responses does not follow exactly the
distribution of perceptual switching (which is high only, when both rules are strong; see
Figure 7), nor does it coincide with the intersection of the phase duration surfaces,
where integration and segregation are balanced (see Figure 15).

Perceptual bi-stability in auditory streaming

43

Manuscript for submission to Hearing Research

13/06/2008

Figure 17. Distribution of mean steady state both-percept responses as a proportion of postfirst-phase stimulus duration in experiment 1. Note to avoid problems arising from accidental
simultaneous button presses we included in this analysis only phases with duration exceeding
300ms.

Both types of patterns being perceived at the same time contradicts our intuitive
assumption of the exclusivity of two competing perceptual organisations in the
alternating two-tone sequence, as well as the findings of Pressnitzer and Hup [5, 6]
and much of the visual literature on bi-stability. However, it has been shown in vision
that contrary to the usual assumptions of exclusivity [12], periods of 'transition' during
which neither eye is clearly dominant can be of rather long duration; comparable with
'eye' dominance durations [24]. This is consistent with the durations of the bothpercept responses in our experiment 1. Furthermore, in another striking analogy
between visual and auditory bi-stability, the distribution of mean both-percept
durations with respect to rule strength resembles the distribution of 'transition' durations
in the binocular rivalry paradigm [24]; as illustrated in figure 18.

Perceptual bi-stability in auditory streaming

44

Manuscript for submission to Hearing Research

Min Min

13/06/2008

Max

Max

Figure 18. Transition (both-percept) phases in binocular rivalry and auditory streaming. Left:
Relationship between transition durations and image contrast (governing the "strength" of the
alternative eye activations) in binocular rivalry; where Dep. refers to the dominant percept
prior to the transition, and Dest. to the percept following the transition; taken from [24].
Right: Relationship between both-percept durations and rule strength in experiment 1. Dep.
refers to the rule strength corresponding to the perceptual organisation prior to the 'both'
response, and Dest. to the rule strength corresponding to the perceptual organisation
following the 'both' response. For the perceptual state of 'integration', the rule strength was
considered to correspond to f; and for 'segregation' the rule strength was considered to
correspond to SOA.

It is not clear at this stage what gives rise to the perception of both patterns
simultaneously, and intuitively we think of them as mutually exclusive. One possibility
is that there is a very rapid switching between the two alternatives but that conscious
perception is more sluggish, and unable to follow this rapid switching. Hence there is a
sort of stroboscopic effect in which both perceptual organisations are perceived as being
present although there is actually switching between them. An alternative explanation
arises from the notion of competition at several different levels in the perceptual
hierarchy. Generally, recurrent top-down connections tend to ensure that there is
consistency across the whole system, but there may be instances in which the top-down
signals cannot overcome the local competitive interactions; in this case, incompatible
Perceptual bi-stability in auditory streaming

45

Manuscript for submission to Hearing Research

13/06/2008

winners could emerge at different levels, and this inconsistency may take some time to
resolve before a consistent organisation emerges throughout the hierarchy. In our
computational modelling studies, we have observed both of these effects and we are
planning to formulate a follow-up experiment to distinguish between these two
possibilities.

General Discussion
Interesting theoretical insights into the processes underlying streaming can be
derived from the experiments reported above. In this section, we present a conceptual
framework for auditory streaming which accounts for previous results as well as our
new findings. Here follows a summary of the novel observations obtained in the current
experiments:
1) Switching between integrated and segregated percepts continues throughout
the stimulus sequences with any combination of f and t (SOA) and in all participants.
Switching is not a product of learning or fatigue. Rather, there appears to be no final
stable percept in auditory streaming.
2) Participants report simultaneous perception of integrated and segregated tone
patterns (both responses); that is, segregated and integrated percepts are not exclusive.
3) With medium-to-large fs and very short ts (SOAs) the first reported
percept is segregation; that is, integration is not always the first percept. Hence,
segregation is not a result of evidence gathering.
4) When f is low/medium and t (SOA) is short the first perceptual phase is
typically much shorter than first phases with other combinations of the two parameters.
The overall amount of switching is also highest in this region. The effect is not fully

Perceptual bi-stability in auditory streaming

46

Manuscript for submission to Hearing Research

13/06/2008

monotonic, first percepts become somewhat longer and the overall amount of switching
increases with very short SOAs and at very low fs; the minimum of the first-phase
duration distribution and the maximum amount of switching appearing at 100-ms SOA
within the 4-10-ST f range in the current experiments.
5) With time, the ratio between integrated and segregated percepts tends to
become balanced irrespective of the combination of f and t (SOA).
6) f and t (SOA) affect the duration of the first perceptual phase in response
to the tone sequences similarly to the way in which visual contrast affects dominance
durations in binocular rivalry. It appears as if varying f affects the integrated and
varying t (SOA) affects the segregated sound organisation similarly to the way in
which 'contrast' is assumed to control the competitiveness of an image in the given eye.
This result supports the notion that f controls the competitiveness (strength) of the
integrated, whereas t (SOA) controls that of the segregated sound organisation.
7) The distribution of the duration of both-percept responses as a function of
the strength of the preceding and the following organisation, assuming the above
described relationship between f and the integrated and t (SOA) and the segregated
organisation, is similar to the distribution of the duration of transitional phases in
binocular rivalry.
The observed qualitative differences between the first and subsequent perceptual
phases have strong implications for theoretical accounts of streaming, which must
explain both the perceptual bias of the first phase, and the continuing switching
behaviour. It is important to note that our distinction here between first and subsequent
perceptual phases is not the same as the usual distinction between build-up and final

Perceptual bi-stability in auditory streaming

47

Manuscript for submission to Hearing Research

13/06/2008

percept [4], because the first phase is generally longer than the expected build-up
duration, and we found no evidence for a stable final percept.
Since participants attended the sounds without attempting to hear them
according to one or the other organisation, the current results cannot tell us whether
switching occurs in an automatic fashion or is the product of attention. Previous
electrophysiological studies [10, 25] obtained correlates of both bottom-up and
attentional processes when participants were exposed to tone sequences similar to those
used in the current experiments. There is some indication that the initial phase of
streaming may be more sensitive to attentional effects [25-27], whereas maintaining
sound organisations does not require focused attention [28]. However, there is evidence
that the segregated organisation can develop [29] without attention, although this may
depend

on

the

actual

stimulus

configuration.

On

the

other

hand,

these

electrophysiological studies did not measure the perception of the critical sounds
on-line, thus possibly averaging together trials with different perceptual organisations.
Therefore, no strong conclusion can be drawn regarding the relationship between
attention and the observed differences between the first and subsequent perceptual
phases.
In the following account, we shall distinguish between the first and subsequent
perceptual phases. We shall attempt to describe them in terms of alternative rule
representations which vie for dominance, and test the viability of this explanation in the
face of our novel findings. As will become clear, a useful way to think about the two
perceptual stages may be as two distinct processes, formation of sequential associations
and coexistence between alternative interpretations.

Perceptual bi-stability in auditory streaming

48

Manuscript for submission to Hearing Research

13/06/2008

First phase: rule formation


There are two aspects to consider: the initial choice of perceptual organisation,
and the duration of the first perceptual phase. We consider these below separately for
the case where the first phase is the integrated state and that where it is the segregated
state.
Integration is most commonly found for the parameter ranges used, except at
very short SOAs and medium-large fs, presumably since sequential associations
between temporally adjacent events are more easily formed in the brain; i.e., there is a
bias towards forming AB and BA links first, and these local rules support
perceptual integration. A consideration of plasticity mechanisms (Hebbian learning)
suggests that the formation of associations is perhaps only possible between consecutive
events; i.e. synaptic plasticity does not support skipping over intervening events. This
principle has two consequences for the effects of f on the formation of sequential
associations. Firstly, in a tonotopically organised system, the amount of overlap
between the neural activity in response to the A and B tones is governed by the
frequency difference between the two: with small fs, the overlap in activity and thus
the potential for forming these associations is large, whereas with large fs it is small.
Thus the main effect of f is on the formation of local rules. Secondly, f may also
affect the speed with which global-rule representations can form. This is because in
order to form higher order sequential associations ('global rules') it is necessary for
populations which respond selectively to one or other of the tones to emerge (i.e. a
process of clustering), before the within-stream sequential associations can be learned.
Only after such populations are formed can the within-stream events become
consecutive, at least as viewed from within the population cluster. If we consider a fixed
SOA, then clearly the difficulty of forming separate clusters will be determined largely
Perceptual bi-stability in auditory streaming

49

Manuscript for submission to Hearing Research

13/06/2008

by f and the extent to which the neural activity elicited by the two tones overlaps,
which in turn will determine the duration before the system can discover the
segregated organisation. This is well illustrated in Figure 16 (right panels), in which f
has a clear effect on the duration of segregation for fixed SOAs. Thus the two effects of
f go hand in hand: smaller fs improve the formation of local-rule links and delay the
formation of global-rule links; larger fs weaken the formation of local-rule links and
allow faster formation of global-rule links.
The above description is consistent with the finding that the fission boundary is
much larger than the smallest detectable frequency difference [30], since segregation
requires not only a detectable difference, but also a clear separation of activity.
Similarly, the amplitude-modulation 'fission' boundary was shown to be much larger
than the smallest detectable amplitude-modulation difference [31]. Although initially
puzzling, the finding of a very stable fission boundary, measured in terms of the
minimum f necessary for segregation [30], can also be reconciled with perceptual
switching; while the formation of separate clusters of activity may depend on some
minimum featural difference, our results show that subsequent stochastic switching is
largely independent of stimulus features. The claims of various groups, e.g. [32-34] that
adaptation in primary auditory cortex is a neural correlate of streaming is also consistent
with these ideas, since adaptation is likely to be an important aspect of the process of
clustering. Therefore, it will be linked to the duration of the first phase, and hence the
rate at which the probability of reporting streaming develops [32-34].
Integration may also be reported initially as a result of a qualitatively different
mechanism when f is small and SOA is short. When two sounds are presented within a
temporal window of less than about 200 ms duration, then they tend to be processed as a

Perceptual bi-stability in auditory streaming

50

Manuscript for submission to Hearing Research

13/06/2008

single event [19], at least within a limited range of frequency differences [20]. In our
experiments this occurs primarily for SOAs 100 ms, and f < 4ST. In these cases the
integrated pattern first perceived is likely to be an ABA chunk. The auditory system
then needs to pull this chunk apart before it can discover the alternative within-stream
sequential associations. Nevertheless, perceptual switching is found here too.
Segregation as the first percept at short SOAs and larger fs can also be
understood in terms of the tendency to chunk acoustic stimuli into single events if they
occur within approximately 200ms of each other [19]. When f exceeds the proposed
spectral window of integration [20], consecutive identical tones may be directly linked
(i.e., with the B tone excluded, the integration window contains A-A). A direct
consequence of this explanation is that first-phase segregated percepts must be based on
the more frequent stimulus (A-A, they cannot be B---B); a prediction which we plan to
test experimentally. Neurophysiological studies suggest a reason for this phenomenon
since it has been shown that responses in cortex are greatly reduced when events follow
each other at a rate faster than about 10 Hz [35]. Therefore, at fast presentation rates, it
is possible that due to insufficient recovery time some populations only respond to one
or other of the tones from the outset. Hence the higher order sequential associations can
be discovered immediately, resulting in segregation becoming the first percept. In our
data, the minimum f for reporting first phase segregation is approximately 5 semitones
at 125-ms SOA, which is not inconsistent with the spectrotemporal window of
integration reported in [20]. Thus, similarly to f, SOA also has two different effects on
the formation of rule representations. Firstly it governs the formation of associations
between consecutive tones, because for such associations the neural after-effects of the
first tone must still be present at the time the next tone arrives. This primarily affects the
formation of within-stream associations since twithin > tacross. Secondly, with short
Perceptual bi-stability in auditory streaming

51

Manuscript for submission to Hearing Research

13/06/2008

SOAs the recovery of the neuronal elements in higher (cortical) levels of the auditory
system may become an issue, and this primarily affects the local rule. Again, the two
effects go hand in hand: shortening the SOA strengthens the associations necessary for
representing the global rule more than the local rule (see Figure 16, left panel dF =
22ST), then, especially at small fs, shortening the SOA weakens the local-rule
associations more than the global-rule ones (see Figure 16, left panel dF = 4ST).
One argument against this description of the SOA effects is that streaming can
be induced by the presenting a sequence of only one of the tones prior to the onset of
the ABA_ABA pattern [36]. Based on this finding Bregman et al. [3] argue against
SOA being an important determinant of streaming, suggesting instead that the
within-stream ISI (t) is the major factor. However, within the above-described
framework, it is easy to understand how the induction sequence used by Rogers and
Bregman [36] led to increased reports of streaming at the onset of the ABA pattern. The
induction sequence promoted the establishment of sequential associations between tones
in one of the streams, thereby biasing the system towards activity associated with
single-frequency streams to be perceived first. In this case, the formation phase would
involve the discovery of linkages between temporally adjacent events, the local rules.
Thus, the effect of the induction sequence is similar to that of very low (<100 ms) SOA
and intermediate or high f in that it forces the global rule to be established before the
local one. However, our results predict that eventually the links required for the local
rule would also be established and switching would commence (but this was not studied
due to the short duration of the sequences).
In summary, the first perceptual phase is essentially concerned with the
formation of alternative perceptual organisations. The perceptual organisation
perceived during the first phase is determined by the stimulus parameters, and the
Perceptual bi-stability in auditory streaming

52

Manuscript for submission to Hearing Research

13/06/2008

duration of this phase depends on the time taken to discover and represent alternative
sequential associations. Consistent with findings in vision that local competition is
necessary in order to trigger perceptual bi-stability [37], the idea emerging from our
auditory streaming studies is that the discovery of feature-sensitive rules or
associations and competition between incompatible associations is necessary for
triggering changes in global perceptual organisation. A consideration of the mean first
phase durations suggests that most previous streaming experiments, which used
relatively short stimulus sequences have largely characterised the initial phase of
perceptual organisation.

Subsequent phases: coexistence of alternative rule representations


Once the various regularities have been discovered then, if the stimulus
continues as before, generic mechanisms of perceptual switching between alternative
perceptual organisations come into play. This we term the coexistence of alternative
interpretations of the sensory environment. The stage is set up by the existence of two
(or more) viable rule representations. Our observations show that the initial
parameter-based bias wears off and a balance emerges between the two alternatives
There are two ways to think about the reason for this tendency. 1) It is easy to see that
in terms of survival, perceptual flexibility offers clear benefits, particularly in the case
of changing contingencies, and that flexibility is best achieved by a system which is
marginally stable. In our computational modelling studies we found that it is extremely
easy to move away from regions of the parameter space where bi-stability exists. This
suggests that there are mechanisms in the brain for ensuring that the system remains
close to these critical points since this is what allows it to switch rapidly between
alternative organisations with the minimum of effort, i.e. neural resources. 2) In real-life

Perceptual bi-stability in auditory streaming

53

Manuscript for submission to Hearing Research

13/06/2008

situations, one almost never focuses for long periods of time on unchanging stimulation.
This is because no new information can be extracted; whereas adaptation to the
environment requires that we discover changes that occur in it. Thus unchanging
stimulation soon becomes part of the background, filtered out by lower-level
mechanisms, until something changes. In this sense, the participants task in these
experiments was rather unnatural: attend to the stimulation and be constantly aware of
your perception, although nothing changes. It is possible that similar bi- or multi-stable
phenomena can occur in natural situations. However, we are not generally aware of it,
because they become part of the unattended background. One possibility is that one
function of switching back and forth between alternative organisations could be to
maintain representations of the alternative organisations while the sounds are not
attended.
The two suggested views do not contradict to each other. Perceptual flexibility
in interpreting the currently unattended background may serve us well. The above view
is strongly supported by findings showing that 1) sound organisation is functional even
when attention is directed away from the sounds [21] and, critically, it is reset when
attention is turned towards them [26, 27]. That is, when we direct our attention away
from a sequence of sounds (treating it as background), the available rule representations
are maintained. However, when we turn our attention again towards these sounds
(selecting them as the foreground), we re-evaluate the possible descriptions. One
consequence of this notion is that by directing attention away then back to the sound
sequence, we could force a new first phase within the middle of the sound train. This we
shall explore in our next follow-up experiment.
The coexistence of segregation and integration is supported by the finding of
significant proportion of both-percept responses. This initially rather surprising
Perceptual bi-stability in auditory streaming

54

Manuscript for submission to Hearing Research

13/06/2008

finding provides further support for the notion that the perceptual organisations which
have been found are simultaneously represented within the brain, even if we are not
always aware of them. Furthermore, the idea that these periods result from a slow
transition from one state to another is contradicted by our finding that after the first
phase, the dynamics of perceptual switching is largely parameter-independent.
However, the finding that the longest duration transition and both responses appear
to occur long a ridge where the strength of the departure and destination organisations is
balanced indicates that equality between competing features (or rules) gives rise to
network conditions where inconsistent winners are most likely to emerge. We suggest
the potentially testable hypothesis that both-percept reports provide evidence for the
existence of a processing hierarchy in the auditory system, within which inconsistent
winners can sometimes emerge at different levels of the hierarchy.

Conclusions
When the auditory system is exposed to an unchanging sequence of sounds,
which can be organised in more than one way, perceptual bi-stability is pervasive. There
is no combination of features that we have tested for which perception remains stable
for even a few minutes. Analysis of the experimental data revealed two phases of
perceptual organisation, which can be characterized by two distinct processes;
formation of sequential associations and coexistence between alternative interpretations.
These perceptual phases differ in their dynamics (perceptual durations differ on average
by an order of magnitude), and sensitivity to stimulus features (which significantly
influence perceptual choice and phase duration only in the first phase). The detailed
similarities between perceptual switching in vision and audition argue for generic
modality-independent processes acting in the second phase employing common

Perceptual bi-stability in auditory streaming

55

Manuscript for submission to Hearing Research

13/06/2008

algorithmic principles to simultaneously maintain alternative descriptions of the sensory


input. We suggest that the perceptual flexibility necessary for effective operation in real
world environments depends on having a system, which can balance on the verge of
instability. This allows attentional modulation to rapidly select the perceptual
organisation best suited to the current behavioural goals.

Acknowledgements
This work is supported by the European Research Area Specific Targeted
Project EmCAP (IST-FP6-013123).

References
1.

Neisser, U., Cognitive Psychology. 1967, New York: Appleton-Century-Crofts.

2.

van Noorden, L.P.A.S., Temporal coherence in the perception of tone


sequences, in Institute for Perception research. 1975: Eindhoven.

3.

Bregman, A.S., et al., Effects of time intervals and tone durations on auditory
stream segregation. Percept Psychophys, 2000. 62(3): p. 626-36.

4.

Bregman, A.S., Auditory Scene Analysis. MIT Press. 1990.

5.

Pressnitzer, D. and J.M. Hup. Is auditory streaming a bistable percept? in


Forum Acusticum. 2005. Budapest.

6.

Pressnitzer, D. and J.M. Hupe, Temporal dynamics of auditory and visual


bistability reveal common principles of perceptual organization. Curr Biol,
2006. 16(13): p. 1351-7.

7.

Blake, R. and N.K. Logothetis, Visual competition. Nat Rev Neurosci, 2002.
3(1): p. 13-21.

8.

Rees, G., G. Kreiman, and C. Koch, Neural correlates of consciousness in


humans. Nat Rev Neurosci, 2002. 3(4): p. 261-70.

9.

Beauvois, M.W. and R. Meddis, Computer simulation of auditory stream


segregation in alternating-tone sequences. J Acoust Soc Am, 1996. 99(4 Pt 1):
p. 2270-80.

Perceptual bi-stability in auditory streaming

56

Manuscript for submission to Hearing Research

10.

13/06/2008

Winkler, I., R. Takegata, and E. Sussman, Event-related brain potentials reveal


multiple stages in the perceptual organization of sound. Brain Res Cogn Brain
Res, 2005. 25(1): p. 291-9.

11.

Denham, S.L. and I. Winkler, The role of predictive models in the formation of
auditory streams. J Physiol Paris, 2006. 100(1-3): p. 154-70.

12.

Leopold, D.A. and N.K. Logothetis, Multistable phenomena: changing views in


perception. Trends Cogn Sci, 1999. 3(7): p. 254-264.

13.

Tong, F., M. Meng, and R. Blake, Neural bases of binocular rivalry. Trends
Cogn Sci, 2006. 10(11): p. 502-11.

14.

Horvth , J., et al., Simultaneously active pre-attentive representations of local


and global rules for sound sequences in the human brain. Brain Res Cogn Brain
Res, 2001. 12(1): p. 131-44.

15.

Khler, W., Gestalt Psychology. 1947, New York: Liveright.

16.

Cowan, N., On short and long auditory stores. Psychol Bull, 1984. 96: p. 351370.

17.

Czigler, I. and I. Winkler, Preattentive auditory change detection relies on


unitary sensory memory representations. Neuroreport 1996. 7: p. 2413-2417.

18.

Yabe, H., et al., Temporal window of integration revealed by MMN to sound


omission. Neuroreport 1997. 8: p. 19711974.

19.

Yabe, H., et al., Organizing sound sequences in the human brain: the interplay
of auditory streaming and temporal integration. Brain Res, 2001. 897(1-2): p.
222-7.

20.

Shinozaki, N., et al., Spectrotemporal window of integration of auditory


information in the human brain. Brain Res Cogn Brain Res, 2003. 17(3): p. 56371.

21.

Sussman, E., Integration and segregation in auditory scene analysis. J. Acoust.


Soc. Am., 2005. 117 (3): p. 12851298.

22.

Anstis, S. and S. Saida, Adaptation to auditory streaming of frequencymodulated tones. Journal of Experimental Psychology: Human perception and
performance, 1985. 11(3): p. 257-271.

23.

Levelt, W.J., Note on the distribution of dominance times in binocular rivalry.


Br J Psychol, 1967. 58(1): p. 143-5.

24.

Brascamp, J.W., et al., The time course of binocular rivalry reveals a


fundamental role of noise. J Vis, 2006. 6(11): p. 1244-56.

Perceptual bi-stability in auditory streaming

57

Manuscript for submission to Hearing Research

25.

13/06/2008

Snyder, J.S., C. Alain, and T.W. Picton, Effects of attention on neuroelectric


correlates of auditory stream segregation. J Cogn Neurosci, 2006. 18(1): p. 113.

26.

Carlyon, R.P., et al., Effects of attention and unilateral neglect on auditory


stream segregation. J Exp Psychol Hum Percept Perform, 2001. 27(1): p. 11527.

27.

Cusack, R., et al., Effects of location, frequency region, and time course of
selective attention on auditory scene analysis. J Exp Psychol Hum Percept
Perform, 2004. 30(4): p. 643-56.

28.

Sussman, E.S., et al., Attentional modulation of electrophysiological activity in


auditory cortex for unattended sounds within multistream auditory
environments. Cogn Affect Behav Neurosci, 2005. 5(1): p. 93-110.

29.

Sussman, E., et al., The role of attention in the formation of auditory streams.
Percept Psychophys, 2006. in press.

30.

Rose, M.M. and B.C. Moore, Perceptual grouping of tone sequences by


normally hearing and hearing-impaired listeners. J Acoust Soc Am, 1997.
102(3): p. 1768-78.

31.

Grimault, N., et al., Learning in discrimination of frequency or modulation rate:


generalization to fundamental frequency discrimination. Hear Res, 2003. 184(12): p. 41-50.

32.

Fishman, Y.I., J.C. Arezzo, and M. Steinschneider, Auditory stream segregation


in monkey auditory cortex: effects of frequency separation, presentation rate,
and tone duration. J Acoust Soc Am, 2004. 116(3): p. 1656-70.

33.

Micheyl, C., et al., Perceptual organization of tone sequences in the auditory


cortex of awake macaques. Neuron, 2005. 48(1): p. 139-48.

34.

Gutschalk, A., et al., Neuromagnetic correlates of streaming in human auditory


cortex. J Neurosci, 2005. 25(22): p. 5382-8.

35.

Creutzfeldt, O., F.C. Hellweg, and C. Schreiner, Thalamocortical


transformation of responses to complex auditory stimuli. Exp Brain Res, 1980.
39(1): p. 87-104.

36.

Rogers, W.L. and A.S. Bregman, An experimental evaluation of three theories of


auditory stream segregation. Percept Psychophys, 1993. 53(2): p. 179-89.

37.

Anourova, I., et al., Selective interference reveals dissociation between auditory


memory for location and pitch. Neuroreport, 1999. 10(17): p. 3543-7.

Perceptual bi-stability in auditory streaming

58

* Suggested Reviewers

Suggested Reviewers:
Joel Snyder joel.snyder@unlv.edu
Daniel Pressnitzer Daniel.Pressnitzer@ens.fr
Raymond van Ee r.vanee@phys.uu.nl