Logo of nihpaAbout Author manuscriptsSubmit a manuscriptNIH Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Curr Biol. Author manuscript; available in PMC Aug 5, 2009.
Published in final edited form as:
Curr Biol. Aug 5, 2008; 18(15): 1124–1128.
Published online Jul 24, 2008. doi:  10.1016/j.cub.2008.06.053
PMCID: PMC2559912
NIHMSID: NIHMS64642

Perceptual organization of sound begins in the auditory periphery

Summary

Segmenting the complex acoustic mixture that makes a typical auditory scene into relevant perceptual objects is one of the main challenges of the auditory system [1], for both human and non-human species. Several recent studies indicate that perceptual auditory object formation, or “streaming”, may be based on neural activity within auditory cortex and beyond [2, 3]. Here, we find that scene analysis starts much earlier in the auditory pathways. Single units were recorded from a peripheral structure of the mammalian auditory brainstem, the cochlear nucleus. Peripheral responses were similar to cortical responses and displayed all of the functional properties required for streaming, including multi-second adaptation. Behavioral streaming was also measured in human listeners. Neurometric functions derived from the peripheral responses predicted accurately behavioral streaming. This reveals that sub-cortical structures may already contribute to the analysis of auditory scenes. This finding is consistent with the observation that species lacking a neocortex can still achieve and benefit from behavioral streaming [4]. For humans, we argue that auditory scene analysis of complex scenes is likely to be based on interactions between sub-cortical and cortical neural processes, with the relative contribution of each stage depending on nature of the acoustic cues forming the streams.

Results and Discussion

We usually experience our acoustic environment as containing multiple “streams” of sounds, which can be selectively attended to and followed over time amid other streams (e.g., the voice of a friend in a crowded restaurant, a musical instrument within an orchestra). Analogous to the segmentation of visual scenes into objects, the parsing of acoustic sequences into streams is an essential component of the perceptual analysis of auditory scenes in humans and various other animal species [1, 57].

Where and how auditory streaming is implemented in the brain are as yet unanswered questions, but a number of physiology and brain-imaging studies have suggested that the auditory cortex plays a key role in the formation of auditory streams [69]. The general form of the neural correlates found in these studies can be described as “grouping by co-activation”: sounds that activate the same or largely overlapping populations of neurons are perceived as forming a single stream, whereas sounds that activate different neuronal populations are perceived as separate streams. For instance, when stimulated with pure tones, most neurons of primary auditory cortex (A1) respond selectively only to a limited range of frequencies. This is consistent with a co-activation model, as consecutive tones with similar frequencies are grouped in a single stream whereas tones differing widely in frequency are heard as separate streams [3, 8, 10]. Similarly, forward suppression of activity could explain the increase in sound segregation with faster rates of tone presentation [8, 10]. A more challenging feature of streaming is that it can change dynamically over time, even if the stimulus itself remains constant (similarly to bistable perception in vision [11]). Predicting the dynamics of streaming is a crucial test for any neural model of streaming. Recently, it has been proposed that multi-second adaptation of neural responses in A1 could explain the behavioural “build-up” of stream segregation when the exposure time to a sound is increased [3].

While the relationship between neural responses in auditory cortex and auditory streaming is being thoroughly investigated, the possible contribution of sub-cortical nuclei has so far remained unexplored. The auditory system contains several sub-cortical nuclei, which are generally believed to establish basic feature encoding before perceptual organization starts at the cortical level [12, 13]. Here, we investigated whether sub-cortical neural processing may in fact also take an active part in auditory perceptual organization. Single neurons were recorded from the ventral part of the cochlear nucleus (CN) of urethane-anaesthetized guinea pigs. The CN is the most peripheral brainstem structure in the ascending auditory pathways, and the site of the first obligatory synapse for all auditory nerve fibers. Its role of interface between the auditory periphery (cochlea and auditory nerve) and the higher central auditory system (inferior colliculus and auditory cortex) makes it an ideal locus to examine the origin of neural correlates of auditory streaming. The CN is made up of a variety of physiologically and histologically well-defined cell types[14]. On the one hand, bushy cells display “primary-like” response properties similar to those of the auditory nerve fibers from which they receive their input, thus providing a window on peripheral responses. On the other hand, the CN also contains cells, such as the multipolar cells, with “chopper-sustained” or “chopper-transient” response properties far more complex than those of the auditory nerve, and that can be thought of as initial brainstem processing of sound. Like A1 neurons, most cells in the CN exhibit frequency selectivity and forward suppression to a varying degree according to their response type [15]. So far, however, neural responses to long-duration sequences such as those used in psychophysical studies of auditory streaming have never been measured at the level of the CN.

To address this question, we used an experimental paradigm similar to the one used in earlier behavioural studies of auditory streaming in humans [16, 17], and in neurophysiological studies at the level of auditory cortex [79, 18]. Sound stimuli were built using pure tones alternating between two frequencies, A and B, and arranged into repeating sequences of ABA- triplets for a total of 10 s (ABA- sequences, Fig. 1A,B, Supplemental Experimental Procedures). The percept evoked by these sound sequences depends on the frequency difference (ΔF) between the A and the B tones, and on the time elapsed since the sequence is turned on. When the frequency difference is small, the sequence is perceived as a single coherent sound stream with a distinctive galloping rhythm (ABA-ABA-). When frequency difference is large, the sequence is usually perceived as a single stream just after it is turned on, but after a few seconds of uninterrupted listening, it separates into two streams each with regular rhythms (stream A-A-A- and stream -B--B--B- ) [1, 16]. The change in percept from one stream to two streams is quite compelling, and is experienced even by listeners who are aware that the physical stimulus does not change over time (online demonstrations at e.g. http://cognition.ens.fr/Audition/sup/).

Figure 1
Illustration of the sound sequences and cochlear nucleus single-unit responses

An example response from a CN neuron to ABA- sequences is illustrated in Fig. 1. The frequency of the A tone was chosen equal to the neuron’s best-frequency (BF) and several values of ΔF were tested. Overall, responses of CN neurons closely resembled responses from single units in the primary auditory cortex [3, 10]. Importantly, they displayed all of the features of the grouping by co-activation model: At small ΔFs (e.g., 1 semitone, Fig. 1A), CN neurons responded to both A and B tones, consistent with the grouped percept reported by listeners for such stimuli. As ΔF increased, neurons responded less and less to the tones that were remote from their BF (the B tones in our paradigm, Fig. 1B). This result, just as in the cortex, is likely due to the combined effects of frequency selectivity and forward suppression of neural responses. The main static features of streaming are thus already apparent in the CN responses.

As mentioned above, a more challenging test for neural models of streaming relates to the dynamic changes in percept that are experienced by listeners as the sequence is heard for a prolonged period of time [19]. We quantified these perceptual effects by asking normal-hearing listeners to report their percept (“one stream” or “two streams”) continuously during the same 10-s stimulus sequences as the ones used for the physiology (Supplemental Experimental Procedures). The average reported percept plotted as a function of time from sequence onset shows that at all but the smallest ΔF, the proportion of two streams responses increases over time (Fig. 3A,B). This build-up of segregation is faster and more pronounced at the largest ΔFs. It has been proposed that this build-up comes from multi-second adaptation of neural responses in auditory cortex [3]. Here, we observed that neurons in the CN also display strong multi-second adaptation in response to the long-duration tone sequences. Both single neurons (Fig. 1) and the population average (Fig. 2A,B) showed a marked and progressive decrease in spike counts over the course of the 10-s stimulus sequence. This multi-second adaptation was present in the two main different types of cells in the ventral subdivision of the CN, including bushy cells which exhibit “primary-like” responses similar to those of auditory-nerve fibers. This shows that the multi-second adaptation observed in auditory cortex is already present in the auditory periphery.

Figure 2
Multi-second adaptation is present in the cochlear nucleus
Figure 3
Responses from the cochlear nucleus predict the behavioural build-up of streaming

Adaptation over several seconds has been reported in the auditory nerve for continuous long-duration, single-frequency tones, and was ascribed to neurotransmitter depletion at the synapse between hair-cell and auditory nerve fibers [20]. We simulated responses of auditory nerve fibers to the ABA- sequences using a representative model of the auditory periphery [21]. The model was chosen as it is fitted to the guinea-pig’s auditory periphery and it reproduces neural forward-masking by means of synaptic depletion. The simulations are presented in Figure S1. The model does not exhibit multi-second adaptation. This indicates either that adaptation to tone sequences emerges in the cochlear nucleus, or that current models of the auditory nerve do not include the appropriate time-constants for multi-second adaptation.

Adaptation in peripheral auditory neurons could also be influenced by descending feedback from upper processing stages, including auditory cortex. It is highly unlikely that the multi-second adaptation we observe in all recorded neurons is a direct reflection of cortical adaptation, because we recorded from the ventral part of the CN for which efferent connections are sparse [22]. It is possible, however, that auditory cortex exerts a modulatory influence on CN activity, either via the sparse direct projections or via the more prevalent indirect projections. A possible pathways for indirect feedback is the medial olivocochlear effent system, which can impose a form of slow gain-control on the cochlea [23] and thus on auditory nerve and CN responses. In the VCN itself, subtle changes in adaptation are observed if feedback projections from the dorsal cochlear nucleus and medial olivary complex are removed [24, 25]. Considering the various possibilities, we suggest that multi-second adaptation to tone sequences in the VCN likely results from the interaction between long-term synaptic depression and fast recovery in peripheral neurons, with possible modulatory influences from descending projections. Whether multi-second adaptation is fully established in the periphery and simply reflected in the cortex, or whether it requires an interaction between lower and higher levels in the auditory pathway remains an open question. In any case, our results show that the CN is involved in shaping this feature of auditory responses in ways not previously predicted.

The finding that neurons in the CN display frequency selectivity, forward suppression, and multi-second adaptation raises the interesting possibility that they can account quantitatively for the behavioral characteristics of auditory streaming. In order to test this possibility, we applied to the CN responses a grouping by co-activation model similar to the one proposed for A1 [3]. The model computes neurometric functions which can be compared directly with psychometric functions measured in human listeners (Supplemental Experimental Procedures). The basic idea of the model is that a one stream percept is predicted if both A and B tones evoke an above-threshold response in single neurons. In contrast, if neurons tuned to the A tones exhibit above-threshold activity during the presentation of the A tones, but not during the presentation of the B tones, a two streams percept is predicted. The average percept probability is finally obtained by tallying the model’s binary decisions across a large number of simulated trials (here, 5,000). Model predictions were computed for each triplet’s position in the sequence, in each ΔF condition. The decision threshold in the model was adjusted to obtain the best fit between the psychometric data and the neural predictions, but it was not allowed to vary across ΔFs and triplets; therefore, variations in the predicted probability of two streams responses as a function of these two parameters is due solely to neural-response characteristics and not to ad hoc changes in the model’s threshold. The neurometric functions obtained using this procedure are presented in Fig. 3A,B. The neurometric functions from CN neurons closely parallel the psychometric functions measured in humans. The level of agreement between neurometric and psychometric functions is just as high as that observed using cortical responses in a previous study [3]. The good fit obtained using only the bushy cells subpopulation (primary-like responses) also raises the possibility that the neural response characteristics needed to predict the psychometric data may already be present at the level of the auditory nerve. In summary, our findings demonstrate that fundamental neural response properties at early stages of the auditory system (frequency selectivity, forward suppression, and multi-second adaptation) can predict perceptual streaming for tone sequences. This extends to perceptual organization the idea, emerging from evidence in different sensory modalities, that adaptation is a key feature of sensory systems allowing for efficient encoding of information [26, 27].

The present results challenge the current view that perceptual organization of sound only emerges at the level of the auditory cortex. Our findings, however, should not be interpreted as implying that the cortex plays no role in auditory scene analysis, or that multi-second adaptation within frequency channels is the only mechanism of streaming. The tone sequences used here produce perceptual streaming on the basis of frequency differences, for which selectivity exists in the auditory periphery. Streaming, however, can also be observed between sounds that activate equivalently the same frequency channels but that have different temporal characteristics [28]. Under such circumstances, streaming must be based on temporal sound features which are extracted by mechanisms other than frequency selectivity, at subcortical [29] or cortical [30] levels of the auditory system. Moreover, in the general case, the sounds to be organized into streams will contain several frequency components and may overlap in time. The amount of overlap is a potent cue to auditory scene analysis, as synchronous frequency components tend to be fused in a single stream regardless of their frequency difference [1]. The grouping by co-activation model that we applied to the ABA- sequences cannot capture these effects. It is however easy to extend the co-activation idea to the time dimension, so that a single stream is predicted if there is co-activation either in time (synchrony cue) or in frequency (neural channel cue). The neural implementation of such an extension likely requires neurons with broad receptive fields that perform frequency-integration, which can be found sub-cortically [31] and are abundant in the cortex [32]. Finally, streaming is affected by attention, context, and knowledge of the listener [16], and it is unclear whether and how such factors may influence responses at lower levels of the auditory system. Our findings must therefore be understood within the classic distinction between primitive vs schema-based processes in auditory scene analysis [1]. Neural response properties, such as frequency selectivity, forward suppression, and multi-second adaptation, but also broadband inhibition [31], could mediate efficient primitive scene analysis mechanisms in the auditory periphery. Other scene analysis mechanisms, based on elaborate features or requiring plasticity, may rather involve auditory cortex [12] and cross-modal [2] cortical regions. Humans’ and other animals’ remarkable ability to organize perceptually the complex mixtures of sounds encountered in natural environments is thus likely to recruit a distributed network involving interactions between sub-cortical and cortical neuronal processes. Such a distributed interaction might be an efficient way to achieve perceptual organization, not only for audition but also for other sensory modalities [33].

Supplementary Material

01

Acknowledgements

This work was supported by CNRS and a grant ANR-06-Neuro-022-01 (DP), NIH RO1DC07657 (CM), the Biotechnology and Biological Sciences Research Council (IMW) and the Frank Edward Elmore fund of the Cambridge clinical school MB PhD program (MS). The authors thank Josh McDermott and Ray Meddis for insightful discussions and suggestions on earlier versions of the manuscript.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

1. Bregman A. Auditory scene analysis. Cambridge, MA: MIT Press; 1990.
2. Cusack R. The intraparietal sulcus and perceptual organization. J Cogn Neurosci. 2005;17:641–651. [PubMed]
3. Micheyl C, Tian B, Carlyon RP, Rauschecker JP. Perceptual organization of tone sequences in the auditory cortex of awake macaques. Neuron. 2005;48:139–148. [PubMed]
4. Fay RR. Auditory stream segregation in goldfish (Carassius auratus) Hear Res. 1998;120:69–76. [PubMed]
5. Bee MA, Klump GM. Primitive auditory stream segregation: a neurophysiological study in the songbird forebrain. J Neurophysiol. 2004;92:1088–1104. [PubMed]
6. Micheyl C, Carlyon RP, Gutschalk A, Melcher JR, Oxenham AJ, Rauschecker JP, Tian B, Courtenay Wilson E. The role of auditory cortex in the formation of auditory streams. Hear Res. 2007;229:116–131. [PMC free article] [PubMed]
7. Snyder JS, Alain C. Toward a neurophysiological theory of auditory stream segregation. Psychol Bull. 2007;133:780–799. [PubMed]
8. Fishman YI, Reser DH, Arezzo JC, Steinschneider M. Neural correlates of auditory stream segregation in primary auditory cortex of the awake monkey. Hear Res. 2001;151:167–187. [PubMed]
9. Gutschalk A, Micheyl C, Melcher JR, Rupp A, Scherg M, Oxenham AJ. Neuromagnetic correlates of streaming in human auditory cortex. J Neurosci. 2005;25:5382–5388. [PMC free article] [PubMed]
10. Fishman YI, Arezzo JC, Steinschneider M. Auditory stream segregation in monkey auditory cortex: effects of frequency separation, presentation rate, and tone duration. J Acoust Soc Am. 2004;116:1656–1670. [PubMed]
11. Pressnitzer D, Hupe JM. Temporal dynamics of auditory and visual bistability reveal common principles of perceptual organization. Curr Biol. 2006;16:1351–1357. [PubMed]
12. Nelken I. Processing of complex stimuli and natural scenes in the auditory cortex. Curr Opin Neurobiol. 2004;14:474–480. [PubMed]
13. Griffiths TD, Warren JD. The planum temporale as a computational hub. Trends Neurosci. 2002;25:348–353. [PubMed]
14. Young ED, Oertel D. The Cochlear Nucleus. In: Shepherd GM, editor. Synaptic Organization of the Brain. New York: Oxford University Press; 2003. pp. 125–163.
15. Bleeck S, Sayles M, Ingham NJ, Winter IM. The time course of recovery from suppression and facilitation from single units in the mammalian cochlear nucleus. Hear Res. 2006;212:176–184. [PubMed]
16. Cusack R, Deeks J, Aikman G, Carlyon RP. Effects of location, frequency region, and time course of selective attention on auditory scene analysis. J Exp Psychol Hum Percept Perform. 2004;30:643–656. [PubMed]
17. Moore BCJ, Gockel H. Factors influencing sequential stream segregation. Acta Acustica United with Acustica. 2002;88:320–332.
18. Wilson EC, Melcher J, Micheyl C, Gutschalk A, Oxenham AJ. Cortical fMRI activation to sequences of tones alternating in frequency: Relationship to perceived rate and streaming. J Neurophysiol. 2007 [PMC free article] [PubMed]
19. Bregman AS. Auditory streaming is cumulative. J Exp Psychol Hum Percept Perform. 1978;4:380–387. [PubMed]
20. Javel E. Long-term adaptation in cat auditory-nerve fiber responses. J Acoust Soc Am. 1996;99:1040–1052. [PubMed]
21. Meddis R, O'Mard LP. A computer model of the auditory-nerve response to forward-masking stimuli. J Acoust Soc Am. 2005;117:3787–3798. [PubMed]
22. Winer JA. Decoding the auditory corticofugal systems. Hear Res. 2006;212:1–8. [PubMed]
23. Sridhar TS, Liberman MC, Brown MC, Sewell WF. A novel cholinergic "slow effect" of efferent stimulation on cochlear potentials in the guinea pig. J Neurosci. 1995;15:3667–3678. [PubMed]
24. Shore SE. Influence of centrifugal pathways on forward masking of ventral cochlear nucleus neurons. J Acoust Soc Am. 1998;104:378–389. [PubMed]
25. Mulders WH, Winter IM, Robertson D. Dual action of olivocochlear collaterals in the guinea pig cochlear nucleus. Hear Res. 2002;174:264–280. [PubMed]
26. Dean I, Harper NS, McAlpine D. Neural population coding of sound level adapts to stimulus statistics. Nat Neurosci. 2005;8:1684–1689. [PubMed]
27. Fairhall AL, Lewen GD, Bialek W, de Ruyter Van Steveninck RR. Efficiency and ambiguity in an adaptive neural code. Nature. 2001;412:787–792. [PubMed]
28. Gutschalk A, Oxenham AJ, Micheyl C, Wilson EC, Melcher JR. Human cortical activity during streaming without spectral cues suggests a general neural substrate for auditory stream segregation. J Neurosci. 2007;27:13074–13081. [PubMed]
29. Winter IM, Wiegrebe L, Patterson RD. The temporal representation of the delay of iterated rippled noise in the ventral cochlear nucleus of the guinea-pig. J Physiol. 2001;537:553–566. [PMC free article] [PubMed]
30. Bendor D, Wang X. The neuronal representation of pitch in primate auditory cortex. Nature. 2005;436:1161–1165. [PMC free article] [PubMed]
31. Pressnitzer D, Meddis R, Delahaye R, Winter IM. Physiological correlates of comodulation masking release in the mammalian ventral cochlear nucleus. J Neurosci. 2001;21:6377–6386. [PubMed]
32. Schreiner CE, Read HL, Sutter ML. Modular organization of frequency integration in primary auditory cortex. Annu Rev Neurosci. 2000;23:501–529. [PubMed]
33. Leopold DA, Maier A. Neuroimaging: perception at the brain's core. Curr Biol. 2006;16:R95–R98. [PubMed]
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...