Format

Send to

Choose Destination
J Acoust Soc Am. 2011 Apr;129(4):2144-62. doi: 10.1121/1.3514544.

A study of acoustic-to-articulatory inversion of speech by analysis-by-synthesis using chain matrices and the Maeda articulatory model.

Author information

1
Department of Electrical Engineering, University of California, Los Angeles, California 90095, USA. panchap@ee.ucla.edu

Abstract

In this paper, a quantitative study of acoustic-to-articulatory inversion for vowel speech sounds by analysis-by-synthesis using the Maeda articulatory model is performed. For chain matrix calculation of vocal tract (VT) acoustics, the chain matrix derivatives with respect to area function are calculated and used in a quasi-Newton method for optimizing articulatory trajectories. The cost function includes a distance measure between natural and synthesized first three formants, and parameter regularization and continuity terms. Calibration of the Maeda model to two speakers, one male and one female, from the University of Wisconsin x-ray microbeam (XRMB) database, using a cost function, is discussed. Model adaptation includes scaling the overall VT and the pharyngeal region and modifying the outer VT outline using measured palate and pharyngeal traces. The inversion optimization is initialized by a fast search of an articulatory codebook, which was pruned using XRMB data to improve inversion results. Good agreement between estimated midsagittal VT outlines and measured XRMB tongue pellet positions was achieved for several vowels and diphthongs for the male speaker, with average pellet-VT outline distances around 0.15 cm, smooth articulatory trajectories, and less than 1% average error in the first three formants.

PMID:
21476670
PMCID:
PMC3188964
DOI:
10.1121/1.3514544
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for American Institute of Physics Icon for PubMed Central
Loading ...
Support Center