Send to

Choose Destination
J Neurosci. 2014 Sep 3;34(36):12145-54. doi: 10.1523/JNEUROSCI.1025-14.2014.

Optimal combination of neural temporal envelope and fine structure cues to explain speech identification in background noise.

Author information

Department of Otorhinolaryngology-Head and Neck Surgery, Samsung Medical Center, Sungkyunkwan University, School of Medicine, Seoul 135-710, Korea.
Department of Audiology and Speech Pathology, University of Tennessee Health Science Center, Knoxville, Tennessee 37996.
Department of Otorhinolaryngology, Seoul Metropolitan Government Boramae Medical Center, Seoul National University, Seoul 156-707, Korea,
Equipe Audition (UMR 8248 CNRS LSP), Institut d'Etude de la Cognition, Ecole Normale Superieure, Paris Sciences et Lettres, Paris 75005, France.
Virginia Merrill Bloedel Hearing Research Center, Department of Otolaryngology-Head and Neck Surgery, University of Washington, Seattle, Washington 98195, and.
Department of Speech, Language, and Hearing Sciences and Weldon School of Biomedical Engineering, Purdue University, West Lafayette, Indiana 47907.


The dichotomy between acoustic temporal envelope (ENV) and fine structure (TFS) cues has stimulated numerous studies over the past decade to understand the relative role of acoustic ENV and TFS in human speech perception. Such acoustic temporal speech cues produce distinct neural discharge patterns at the level of the auditory nerve, yet little is known about the central neural mechanisms underlying the dichotomy in speech perception between neural ENV and TFS cues. We explored the question of how the peripheral auditory system encodes neural ENV and TFS cues in steady or fluctuating background noise, and how the central auditory system combines these forms of neural information for speech identification. We sought to address this question by (1) measuring sentence identification in background noise for human subjects as a function of the degree of available acoustic TFS information and (2) examining the optimal combination of neural ENV and TFS cues to explain human speech perception performance using computational models of the peripheral auditory system and central neural observers. Speech-identification performance by human subjects decreased as the acoustic TFS information was degraded in the speech signals. The model predictions best matched human performance when a greater emphasis was placed on neural ENV coding rather than neural TFS. However, neural TFS cues were necessary to account for the full effect of background-noise modulations on human speech-identification performance.


computational model; neural mechanism; speech perception; temporal cues

[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for HighWire Icon for PubMed Central
Loading ...
Support Center