Format

Send to

Choose Destination
J Acoust Soc Am. 2012 Dec;132(6):3980-9. doi: 10.1121/1.4763545.

A procedure for estimating gestural scores from speech acoustics.

Author information

1
Haskins Laboratories, 300 George Street, Suite 900, New Haven, Connecticut 06511, USA. nam@haskins.yale.edu

Abstract

Speech can be represented as a constellation of constricting vocal tract actions called gestures, whose temporal patterning with respect to one another is expressed in a gestural score. Current speech datasets do not come with gestural annotation and no formal gestural annotation procedure exists at present. This paper describes an iterative analysis-by-synthesis landmark-based time-warping architecture to perform gestural annotation of natural speech. For a given utterance, the Haskins Laboratories Task Dynamics and Application (TADA) model is employed to generate a corresponding prototype gestural score. The gestural score is temporally optimized through an iterative timing-warping process such that the acoustic distance between the original and TADA-synthesized speech is minimized. This paper demonstrates that the proposed iterative approach is superior to conventional acoustically-referenced dynamic timing-warping procedures and provides reliable gestural annotation for speech datasets.

PMID:
23231127
PMCID:
PMC3528686
DOI:
10.1121/1.4763545
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for American Institute of Physics Icon for PubMed Central
Loading ...
Support Center