Improved perception of speech in noise and Mandarin tones with acoustic simulations of harmonic coding for cochlear implants

Xing Li; Kaibao Nie; Nikita S Imennov; Jong Ho Won; Ward R Drennan; Jay T Rubinstein; Les E Atlas

doi:10.1121/1.4756827

Improved perception of speech in noise and Mandarin tones with acoustic simulations of harmonic coding for cochlear implants

J Acoust Soc Am. 2012 Nov;132(5):3387-98. doi: 10.1121/1.4756827.

Authors

Xing Li¹, Kaibao Nie, Nikita S Imennov, Jong Ho Won, Ward R Drennan, Jay T Rubinstein, Les E Atlas

Affiliation

¹ Department of Electrical Engineering, University of Washington, Seattle, Washington 98195, USA.

Abstract

Harmonic and temporal fine structure (TFS) information are important cues for speech perception in noise and music perception. However, due to the inherently coarse spectral and temporal resolution in electric hearing, the question of how to deliver harmonic and TFS information to cochlear implant (CI) users remains unresolved. A harmonic-single-sideband-encoder [(HSSE); Nie et al. (2008). Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing; Lie et al., (2010). Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing] strategy has been proposed that explicitly tracks the harmonics in speech and transforms them into modulators conveying both amplitude modulation and fundamental frequency information. For unvoiced speech, HSSE transforms the TFS into a slowly varying yet still noise-like signal. To investigate its potential, four- and eight-channel vocoder simulations of HSSE and the continuous-interleaved-sampling (CIS) strategy were implemented, respectively. Using these vocoders, five normal-hearing subjects' speech recognition performance was evaluated under different masking conditions; another five normal-hearing subjects' Mandarin tone identification performance was also evaluated. Additionally, the neural discharge patterns evoked by HSSE- and CIS-encoded Mandarin tone stimuli were simulated using an auditory nerve model. All subjects scored significantly higher with HSSE than with CIS vocoders. The modeling analysis demonstrated that HSSE can convey temporal pitch cues better than CIS. Overall, the results suggest that HSSE is a promising strategy to enhance speech perception with CIs.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Acoustic Stimulation
Audiometry, Speech
Cochlear Implants*
Computer Simulation
Cues
Humans
Least-Squares Analysis
Noise / adverse effects*
Perceptual Masking*
Phonetics*
Psychoacoustics
Recognition, Psychology
Signal Processing, Computer-Assisted*
Sound Spectrography
Speech Acoustics*
Speech Perception*
Time Factors

Abstract

Publication types

MeSH terms

Grants and funding