• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptNIH Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Curr Opin Neurobiol. Author manuscript; available in PMC Jun 1, 2011.
Published in final edited form as:
PMCID: PMC2901988

Behind the Scenes of Auditory Perception


“Auditory scenes” often contain contributions from multiple acoustic sources. These are usually heard as separate auditory “streams”, which can be selectively followed over time. How and where these auditory streams are formed in the auditory system is one of the most fascinating questions facing auditory scientists today. Findings published within the last two years indicate that both cortical and sub-cortical processes contribute to the formation of auditory streams, and they raise important questions concerning the roles of primary and secondary areas of auditory cortex in this phenomenon. In addition, these findings underline the importance of taking into account the relative timing of neural responses, and the influence of selective attention, in the search for neural correlates of the perception of auditory streams.


We are usually surrounded by multiple sound sources. The sound waves produced by these sources mingle before reaching our ears, creating complex vibration patterns on our eardrums. An essential function of the auditory system is to analyze these patterns to recover the sound sources that generated them (Figure 1a). This is known as the “auditory scene analysis” problem [1] or, more colloquially, as the “cocktail party” problem [2]. Understanding how the auditory system solves this problem is one of the most fascinating tasks facing auditory scientists today. During the last decade, studies devoted to exploring how, and where, auditory scenes are analyzed in the brain have multiplied, using techniques ranging from single-unit recordings to electroencephalography (EEG), magnetoencephalography (MEG), and functional magnetic resonance imaging (fMRI). Although the results of some of these studies were reviewed in earlier publications [312], during the last two years, several new findings have emerged. Some of these findings qualify the conclusions of earlier studies, and raise important questions for future research.

Figure 1
The psychophysical, biological, and computational facets of the “auditory scene analysis” problem. (a) From bottom to top: Acoustic waves coming from different sound sources mingle in the propagating medium. The physical characteristics ...

The aim of this article is twofold: first, to provide a brief overview of the current state of knowledge concerning the neural basis of auditory scene analysis, for readers who are not familiar with this research; second, to summarize the three (in our view) most significant questions that have emerged from findings published on this topic within the last two years. Being brief, this review is also, necessarily, selective. It focuses on an important aspect of auditory scene analysis, which the auditory-perception and auditory-neuroscience literatures commonly refer to as “auditory streaming” (Box 1).

BOX 1What is “auditory streaming”?

An essential aspect of the analysis of auditory scenes relates to the perceptual organization of sounds into “streams”—commonly referred to as “auditory streaming”. Broadly speaking, a “stream” is a sound, or group of sounds, which is perceived by the listener as a coherent entity. It can be selectively attended to, amid other sounds. The word “stream” emphasizes the fact that sounds are often embedded in temporal sequences. Speech and music provide examples of this. Streams usually correspond to sound sources in the listener’s environment. In some cases, however, sounds from multiple sources are heard as a single stream. Whether different sounds are heard as a single stream, or as separate streams, depends in part on stimulus characteristics, in part on the listener’s intentions, i.e., whether the listener is actively trying to hear out certain sounds. In general, two sounds (A and B) that have approximately the same loudness, pitch, timbre, and location are heard as a single stream when played in a sequence (e.g., ABAB…). In contrast, sounds that differ markedly in pitch, timbre, or location usually form separate streams.

Although “auditory streaming” is sometimes used to mean specifically: “the perceptual organization of sequential sounds”, sounds usually consist of many spectral components, which often span a wide frequency range (from several Hz to several kHz). Thus, when forming auditory streams, the auditory system must group sounds across time, and in addition, group simultaneous frequency components that arose from a common source—while keeping these components separate from those produced by other sources. Synchrony provides a useful cue for solving this problem. Spectral components that start at approximately the same time (within a few tens of ms) are usually grouped together in perception, while frequency components that start at different times tend to be heard as separate.

Do auditory streams emerge below, in, or beyond the auditory cortex?

The auditory system is a multi-storey building (Figure 1b). As sensory information ascends from the cochlea to the primary auditory cortex, it passes through several nuclei. Neurons at each level exhibit complex response properties, which reflect sophisticated signal-processing operations. An important task for auditory researchers is to clarify the role of these different processing stages in auditory scene analysis. Most of the studies that have been performed during the last decade concerning the neural basis of auditory streaming have focused on the auditory cortex. In particular, several studies have been devoted to identifying single- or multi-unit correlates of auditory streaming in the primary auditory cortex (A1) of mammals [1316]—or in the corresponding field in the avian forebrain, the field L ([1719]). The results of these studies have revealed striking relationships between neural responses to sequences of alternating tones in A1 (or field L) and psychophysical measures of auditory streaming obtained using similar stimuli in humans. These findings have led to the view that auditory streams are formed in the primary auditory cortex, or perhaps below it. The latter possibility is supported by the results of a study, which demonstrates that neural response properties in the cochlear nucleus (the very first central auditory nucleus) can account for several features of the perceptual organization of sequences of alternating tones [20]. This finding raises important questions; in particular: What is the actual contribution of the primary auditory cortex to the formation of auditory streams? Are there conditions under which the perceptual organization of sound sequences is reflected in neural responses in, but not below, primary auditory cortex? Do the neural response patterns that have been putatively identified as neural correlates of auditory streaming in these studies actually covary with the animal’s perception of the stimulus (as inferred using behavioral tasks) during the neural recordings?

In parallel to these single- and multi-unit studies in animals, several studies have investigated cortical correlates of auditory streaming in humans using EEG [2127], MEG [2830], or fMRI [3034]. The advantage of working in humans is that percepts can be probed simply by asking the listener what he/she perceives while brain activity is being measured. Perhaps the most significant outcome of these studies was the demonstration of co-variations between cortical responses and perceptual judgments of in the absence of corresponding changes in the physical stimulus [e.g., 26,27,29,32,34]. This was achieved by using perceptually ambiguous sound sequences, the perception of which switched randomly from “one stream” to “two streams” over time. Interestingly, in the EEG and MEG studies [26,27,29], these co-variations involved long-latency responses (such as the N1 or N1m), which are thought to be generated in secondary areas of auditory cortex. Using a perceptual camouflaging paradigm in which listeners had to detect a stream of (target) tones embedded in a stochastic background, a recent MEG study found that long-latency responses to the target tones were only evoked on trials on which the corresponding stream had been detected by the listener [35]. In contrast, steady-state responses, which were presumably generated in primary auditory cortex, were evoked by the target tones regardless of whether or not the listener detected the target stream. Based on these EEG and MEG findings, it is tempting to speculate that neural activity in the secondary auditory cortex is more closely related to listeners’ actual perception of an auditory stream than primary auditory cortex activity. However, further study is needed before a strong conclusion can be reached on this point.

The fMRI data are not as clear-cut. Several studies have found differences in blood oxygen level-dependent (BOLD) signals in regions corresponding to the primary and/or secondary auditory cortex depending on factors known to influence auditory streaming [30,31,33]. However, until recently, the only fMRI study of auditory streaming in which potential stimulus confounds had been eliminated (by using physical constant but perceptually ambiguous stimuli) had failed to find significant co-variations between listeners’ percepts and BOLD signals in auditory cortex [32]. This state of affairs recently changed, as an fMRI study, which also used perceptually ambiguous sequences, found significant differences in the relative timings of activations in the auditory cortex and the medial geniculate body (MGB) during perceptual reversals [34]. Although further work is needed to clarify the implications of this recent finding, one interpretation of this finding is that the perception of sound sequences such as those used in studies of auditory streaming emerges from interactions between the auditory cortex and the thalamus.

The possibility that cortical areas beyond the auditory cortex contribute to the perception of auditory streams is suggested by fMRI data. In particular, one study found significantly greater activation in the intraparietal sulcus (IPS) when listeners heard a perceptually ambiguous sequence of tones as two streams than when the same listeners heard the same sequence as a single stream [32]. The IPS has been implicated in visual binding. However, it remains unclear whether the change in IPS activation that was observed in this study was a cause, or a consequence, of the perceptual change from one to two streams. Whether, and how, neural activity in cortical regions located outside of auditory cortex influence the formation of auditory streams are still open questions.

How are auditory streams formed in the brain? The role of temporal coherence

Auditory streaming has traditionally been studied using sequences of pure tones at two frequencies (A and B), which are played in alternation (forming a repeating AB or ABA pattern)—an audio demo can be found at: http://www.tc.umn.edu/~cmicheyl/demos.html. The probability that the A and B tones are heard as separate streams usually increases with their frequency separation, the pace of tone presentation, and—provided that the sequence is continuously attended to by the listener—the time elapsed since the onset of the stimulus sequence. It has been suggested that these perceptual effects are mediated by three important response properties of auditory neurons: frequency selectivity, forward suppression, and adaptation [1318]. The net effect of these properties is that the A and B tones activate increasingly distinct neural populations in A1 as the frequency separation, presentation rate, and stimulation time increase. According to this view, sounds form separate streams when they activate distinct (or weakly overlapping) populations in central auditory neurons; in contrast, sounds that activate the same (or largely overlapping) neural populations form a single stream. This idea can be generalized to explain stream segregation based on other sound attributes than frequency. For instance, segregation based on differences in spectral envelope (timbre), periodicity (pitch), or modulation rate (pitch or roughness) can be explained by considering populations of neurons selective to these attributes [19,30].

Recent findings indicate that spatial separation between responsive neural populations in A1 is not a sufficient condition for stream segregation, however. This was revealed by altering the classic ABAB stimulus, in such a way that the A and B tones were synchronous instead of alternating. Under such conditions, the tones were heard as a single stream, even when they were sufficiently far-apart in frequency to activate well-separated populations of neurons in A1 [36]. Thus, it appears that, for sounds to be separated into streams, the tones must not only activate different neural populations; in addition, the populations must be activated at different times. Conversely, well-separated neural populations in A1 can support a single stream percept, if they are activated in a temporally coherent fashion. This is especially important because many naturally occurring sounds, such as speech, contain multiple spectral components spread over wide frequency range; the temporal coherence of these components promotes their grouping into a common stream even when they are widely separated in frequency (see the schematic illustration of this idea in Figure 1c).

The role of temporal coherence in perceptual grouping need not be limited to frequency-selective neural populations. The grouping of temporally coherent responses across neural populations that encode different auditory attributes (e.g., pitch and spatial location) can explain how the auditory system successfully associates multiple attributes with the correct stream—so that the pitches of sounds arising from different locations are not confused by the listener (Figure 1c). This idea may be thought of as an instantiation of the “binding hypothesis” in audition. Recent computational models based on the principle of grouping-by-temporal-coherence can mimic the perceptual organization of a wide variety of sounds [36,37]. However, further study is needed to clarify how this principle of stream formation based on temporal coherence is implemented in the central auditory system.

How does attention influence auditory stream formation at the neural level?

The influence of attention on the formation of auditory streams has inspired several studies during the last decade. EEG studies have identified neural indices of stream segregation, such as the mismatch negativity (MMN), which are modulated by attention, but can be detected in averaged responses even when the listeners are engaged in a task that draws their attention away from the evoking sounds [21,3840]. The “object related negativity” (ORN), a neural index of the perceived segregation of concurrent sounds, can also be recorded when the participant’s attention is not focused on the auditory stimulus [2224,41]. These findings suggest that, under many circumstances, incoming sounds can be parsed “automatically” by the auditory system, based on stimulus properties (such as temporal coherence); attention may only come into play after streams have been formed, as one of the streams is being selected for further listening.

On the other hand, psychophysical data (and introspection) indicate that selective attention can influence stream formation. For instance, listening actively for high-pitch tones in a sound sequence that contains both low- and high-pitch tones can promote the perception of this sequence as two separate streams instead of a single stream. The neural basis of this effect remains unclear. Several studies within the last twenty years [e.g., 42], including recent ones [43, 44], have found that cortical responses to sounds that form part of a selectively attended stream are enhanced compared to responses to unattended sounds. An important goal for future studies is to determine whether this modulation of neural responses depending on selective attention affects neural representations of already formed streams, or whether it influences the neural stream-formation process itself.


Over the last decade, a rapidly increasing number of studies have started to explore where and how auditory streams are formed in the brain. Neural correlates of auditory streaming have been identified in, below, and even beyond the auditory cortex–in cortical regions not traditionally associated with auditory processing. This suggests that the formation of auditory streams involves a broadly distributed neural network. An important goal for future research will be to clarify the roles of primary and secondary areas of auditory cortex in this phenomenon. This may require the use of micro-stimulation and selective-deactivation (e.g., cooling) techniques combined with behavioral measures. Awaiting such studies, experiments in which behavioral measures are combined with simultaneous single-unit recordings may represent the next most important step toward a clearer understanding of the neural basis of auditory streaming.

The grouping of temporally coherent responses across neurons tuned to different frequencies or different stimulus attributes appears as an important aspect of how the auditory system forms auditory streams and, at the same time, solves the “binding” problem. However, further research is needed to determine if, how, and where this principle of temporal coherence is implemented in the central auditory system.

Finally, the extent to which neural representations of auditory streams in (and below) the auditory cortex are influenced by selective attention deserves further investigation. The abundance of descending (efferent) connections in the auditory system provides ample opportunity for “top-down” influences, and makes it quite possible that effects of selective attention affect early stages of the neural analysis of auditory scenes. Teasing apart “bottom up” and “top down” influences on neural responses to sound sequences at different levels of processing within the auditory system promises to be an exciting challenge.


This work was supported by NIH R01 DC 07657 and Advanced Acoustic Concepts.


Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Contributor Information

Shihab A Shamma, Department of Electrical and Computer Engineering & Institute for Systems Research, University of Maryland College Park.

Christophe Micheyl, Department of Psychology, University of Minnesota.


Articles of particular interest, published within the last two years, have been highlighted as:

• of special interest

•• of outstanding interest

1. Bregman AS. Auditory Scene Analysis: The Perceptual Organisation of Sound. Cambridge, MA: MIT Press; 1990.
2. McDermott J. The cocktail party problem. Curr Biol. 2009;19:R1024–R1027. [PubMed]
3. Carlyon RP. How the brain separates sounds. Trends Cogn Sci. 2004;8:465–471. [PubMed]
4. Griffiths TD, Warren JD. What is an auditory object? Nat Rev Neurosci. 2004;5:887–892. [PubMed]
5. Sinex DG. Spectral processing and sound source determination. Int Rev Neurobiol. 2005;70:371–398. [PMC free article] [PubMed]
6. Alain C. Breaking the wave: Effects of attention and learning on concurrent sound perception. Hear Res. 2007 [PubMed]
7. Micheyl C, Carlyon RP, Gutschalk A, Melcher JR, Oxenham AJ, Rauschecker JP, Tian B, Courtenay Wilson E. The role of auditory cortex in the formation of auditory streams. Hear Res. 2007:229. [PMC free article] [PubMed]
8. Snyder JS, Alain C. Toward a neurophysiological theory of auditory stream segregation. Psychol Bull. 2007;133:780–799. [PubMed]
9. Bee MA, Micheyl C. The cocktail party problem: what is it? How can it be solved? And why should animal behaviorists study it? J Comp Psychol. 2008;122:235–251. [PMC free article] [PubMed]
10. Nelken I, Bar-Yosef O. Neurons and objects: the case of auditory cortex. Front Neurosci. 2008;2:107–113. [PMC free article] [PubMed]
11. Bidet-Caulet A, Bertrand O. Neurophysiological mechanisms involved in auditory perceptual organization. Front Neurosci. 2009;3:182–191. [PMC free article] [PubMed]
12. Micheyl C, Oxenham AJ. Pitch, harmonicity and concurrent sound segregation: Psychoacoustical and neurophysiological findings. Hear Res. 2009 [PMC free article] [PubMed]
13. Fishman YI, Reser DH, Arezzo JC, Steinschneider M. Neural correlates of auditory stream segregation in primary auditory cortex of the awake monkey. Hear Res. 2001;151:167–187. [PubMed]
14. Kanwal JS, Medvedev AV, Micheyl C. Neurodynamics for auditory stream segregation: tracking sounds in the mustached bat’s natural environment. Network. 2003;14:413–435. [PubMed]
15. Fishman YI, Arezzo JC, Steinschneider M. Auditory stream segregation in monkey auditory cortex: effects of frequency separation, presentation rate, and tone duration. J Acoust Soc Am. 2004;116:1656–1670. [PubMed]
16. Micheyl C, Tian B, Carlyon RP, Rauschecker JP. Perceptual organization of tone sequences in the auditory cortex of awake macaques. Neuron. 2005;48:139–148. [PubMed]
17. Bee MA, Klump GM. Primitive auditory stream segregation: a neurophysiological study in the songbird forebrain. J Neurophysiol. 2004;92:1088–1104. [PubMed]
18. Bee MA, Klump GM. Auditory stream segregation in the songbird forebrain: effects of time intervals on responses to interleaved tone sequences. Brain Behav Evol. 2005;66:197–214. [PubMed]
19. Itatani N, Klump GM. Auditory streaming of amplitude-modulated sounds in the songbird forebrain. J Neurophysiol. 2009;101:3212–3225. [PubMed]
20••. Pressnitzer D, Sayles M, Micheyl C, Winter IM. Perceptual organization of sound begins in the auditory periphery. Curr Biol. 2008;18:1124–1128. This study shows that neural-response patterns in the cochlear nucleus (the first obligatory station in the central auditory system) can already explain important features of the perceptual organization of alternating tone sequences into streams, thus raising questions concerning the exact role of auditory cortex in this perceptual phenomenon. [PMC free article] [PubMed]
21. Sussman E, Ritter W, Vaughan HGJ. An investigation of the auditory streaming effect using event-related brain potentials. Psychophysiology. 1999;36:22–34. [PubMed]
22. Alain C, Schuler BM, McDonald KL. Neural activity associated with distinguishing concurrent auditory objects. J Acoust Soc Am. 2002;111:990–995. [PubMed]
23. Dyson BJ, Alain C. Representation of concurrent acoustic objects in primary auditory cortex. J Acoust Soc Am. 2004;115:280–288. [PubMed]
24. Alain C, Reinke K, He Y, Wang C, Lobaugh N. Hearing two things at once: neurophysiological indices of speech segregation and identification. J Cogn Neurosci. 2005;17:811–818. [PubMed]
25. Winkler I, Takegata R, Sussman E. Event-related brain potentials reveal multiple stages in the perceptual organization of sound. Brain Res Cogn Brain Res. 2005;25:291–299. [PubMed]
26. Snyder JS, Alain C, Picton TW. Effects of attention on neuroelectric correlates of auditory stream segregation. J Cogn Neurosci. 2006;18:1–13. [PubMed]
27•. Snyder JS, Holder WT, Weintraub DM, Carter OL, Alain C. Effects of prior stimulus and prior perception on neural correlates of auditory stream segregation. Psychophysiology. 2009;46:1208–1215. This EEG study shows effects of temporal context on the perceptual organization of alternating tone sequences into auditory streams, and parallel effects on long-latency evoked responses in human auditory cortex. [PubMed]
28. Yabe H, Winkler I, Czigler I, Koyama S, Kakigi R, Sutoh T, Hiruma T, Kaneko S. Organizing sound sequences in the human brain: the interplay of auditory streaming and temporal integration. Brain Res. 2001;897:222–227. [PubMed]
29. Gutschalk A, Micheyl C, Melcher JR, Rupp A, Scherg M, Oxenham AJ. Neuromagnetic correlates of streaming in human auditory cortex. J Neurosci. 2005;25:5382–5388. [PMC free article] [PubMed]
30. Gutschalk A, Oxenham AJ, Micheyl C, Wilson EC, Melcher JR. Human cortical activity during streaming without spectral cues suggests a general neural substrate for auditory stream segregation. J Neurosci. 2007;27:13074–13081. [PubMed]
31. Deike S, Gaschler-Markefski B, Brechmann A, Scheich H. Auditory stream segregation relying on timbre involves left auditory cortex. Neuroreport. 2004;15:1511–1514. [PubMed]
32. Cusack R. The intraparietal sulcus and perceptual organization. J Cogn Neurosci. 2005;17:641–651. [PubMed]
33. Wilson EC, Melcher JR, Micheyl C, Gutschalk A, Oxenham AJ. Cortical FMRI activation to sequences of tones alternating in frequency: relationship to perceived rate and streaming. J Neurophysiol. 2007;97:2230–2238. [PMC free article] [PubMed]
34••. Kondo HM, Kashino M. Involvement of the thalamocortical loop in the spontaneous switching of percepts in auditory streaming. J Neurosci. 2009;29:12695–12701. This fMRI study shows differences in the relative timing of medial-geniculate-body and auditory-cortex activation during percept reversals evoked by bi-stable sound sequences, which are heard sometimes as a single stream, sometimes as two separate streams. [PubMed]
35••. Gutschalk A, Micheyl C, Oxenham AJ. Neural correlates of auditory perceptual awareness under informational masking. PLoS Biol. 2008;6:e138. This MEG study shows that a regularly repeating stream of tones camouflaged in a randomly varying background only evokes concomitant long-latency responses (which are thought to originate in the secondary auditory cortex) when the stream is detected by the listener. In contrast, the same tones evoke equally robust steady-state responses (which are thought to originate in the primary auditory cortex), regardless of whether the corresponding stream is detected by the listener or not. [PMC free article] [PubMed]
36••. Elhilali M, Ma L, Micheyl C, Oxenham AJ, Shamma SA. Temporal Coherence in the Perceptual Organization and Cortical Representation of Auditory Scenes. Neuron. 2009;61:317–329. This study indicates that spatial separation of neural responses in A1 based on frequency separation is not a sufficient condition for stream segregation, and that the relative timing (temporal coherence) of neural responses is key to understanding how auditory streams emerge in auditory cortex. [PMC free article] [PubMed]
37•. Elhilali M, Shamma SA. A cocktail party with a cortical twist: how cortical mechanisms contribute to sound segregation. J Acoust Soc Am. 2008;124:3751–3771. An extensive computational modeling study, which illustrates and supports the idea that temporal correlations between the responses of cortical auditory channels play a crucial role in the formation of auditory streams. [PMC free article] [PubMed]
38. Sussman ES, Bregman AS, Wang WJ, Khan FJ. Attentional modulation of electrophysiological activity in auditory cortex for unattended sounds within multistream auditory environments. Cogn Affect Behav Neurosci. 2005;5:93–110. [PubMed]
39. Sussman ES. Integration and segregation in auditory scene analysis. J Acoust Soc Am. 2005;117:1285–1298. [PubMed]
40. Sussman ES, Horvath J, Winkler I, Orr M. The role of attention in the formation of auditory streams. Percept Psychophys. 2007;69:136–152. [PubMed]
41. Alain C, Arnott SR, Picton TW. Bottom-up and top-down influences on auditory scene analysis: evidence from event-related brain potentials. J Exp Psychol Hum Percept Perform. 2001;27:1072–1089. [PubMed]
42. Woldorff MG, Gallen CC, Hampson SA, Hillyard SA, Pantev C, Sobel D, Blooms FE. Modulation of early sensory processing in human auditory cortex during auditory selective attention. Proc Natl Acad Sci U S A. 1993;90:8722–8726. [PMC free article] [PubMed]
43. Bidet-Caulet A, Fischer C, Besle J, Aguera PE, Giard MH, Bertrand O. Effects of selective attention on the electrophysiological representation of concurrent sounds in the human auditory cortex. J Neurosci. 2007;27:9252–9261. [PubMed]
44•. Elhilali M, Xiang J, Shamma SA, Simon JZ. Interaction between attention and bottom-up saliency mediates the representation of foreground and background in an auditory scene. PLoS Biol. 2009;7:e1000129. Using the same perceptual-camouflaging paradigm as Gutschalk et al. (2008), this MEG study shows that steady-state responses to a regularly repeating stream of tones are modulated depending on whether the listener attends selectively to this stream, or to the background in which it is embedded. [PMC free article] [PubMed]
PubReader format: click here to try


Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


  • PubMed
    PubMed citations for these articles

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...