Noise estimation in voice signals using short-term cepstral analysis

J Acoust Soc Am. 2007 Mar;121(3):1679-90. doi: 10.1121/1.2427123.

Abstract

Cepstral-based estimation is used to provide a baseline estimate of the noise level in the logarithmic spectrum for voiced speech. A theoretical description of cepstral processing of voiced speech containing aspiration noise, together with supporting empirical data, is provided in order to illustrate the nature of the noise baseline estimation process. Taking the Fourier transform of the liftered (filtered in the cepstral domain) cepstrum produces a noise baseline estimate. It is shown that Fourier transforming the low-pass liftered cepstrum is comparable to applying a moving average (MA) filter to the logarithmic spectrum and hence the baseline receives contributions from the glottal source excited vocal tract and the noise excited vocal tract. Because the estimation process resembles the action of a MA filter, the resulting noise baseline is determined by the harmonic resolution (as determined by the temporal analysis window length) and the glottal source spectral tilt. On selecting an appropriate temporal analysis window length the estimated baseline is shown to lie halfway between the glottal excited vocal tract and the noise excited vocal tract. This information is employed in a new harmonics-to-noise (HNR) estimation technique, which is shown to provide accurate HNR estimates when tested on synthetically generated voice signals.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Humans
  • Models, Biological*
  • Noise*
  • Speech / physiology
  • Time Factors
  • Vocal Cords / physiology
  • Voice / physiology*