Format

Send to

Choose Destination
Atten Percept Psychophys. 2016 Feb;78(2):583-601. doi: 10.3758/s13414-015-1026-y.

Timing in audiovisual speech perception: A mini review and new psychophysical data.

Author information

1
Department of Cognitive Sciences, University of California, Irvine, CA, 92697, USA. jvenezia@uci.edu.
2
Department of Psychology, University of California, Los Angeles, CA, USA.
3
Department of Linguistics, University of Maryland, Baltimore, MD, USA.
4
Department of Anatomy and Neurobiology, University of California, Irvine, CA, USA.
5
Department of Cognitive Sciences, University of California, Irvine, CA, 92697, USA.

Abstract

Recent influential models of audiovisual speech perception suggest that visual speech aids perception by generating predictions about the identity of upcoming speech sounds. These models place stock in the assumption that visual speech leads auditory speech in time. However, it is unclear whether and to what extent temporally-leading visual speech information contributes to perception. Previous studies exploring audiovisual-speech timing have relied upon psychophysical procedures that require artificial manipulation of cross-modal alignment or stimulus duration. We introduce a classification procedure that tracks perceptually relevant visual speech information in time without requiring such manipulations. Participants were shown videos of a McGurk syllable (auditory /apa/ + visual /aka/ = perceptual /ata/) and asked to perform phoneme identification (/apa/ yes-no). The mouth region of the visual stimulus was overlaid with a dynamic transparency mask that obscured visual speech in some frames but not others randomly across trials. Variability in participants' responses (~35 % identification of /apa/ compared to ~5 % in the absence of the masker) served as the basis for classification analysis. The outcome was a high resolution spatiotemporal map of perceptually relevant visual features. We produced these maps for McGurk stimuli at different audiovisual temporal offsets (natural timing, 50-ms visual lead, and 100-ms visual lead). Briefly, temporally-leading (~130 ms) visual information did influence auditory perception. Moreover, several visual features influenced perception of a single speech sound, with the relative influence of each feature depending on both its temporal relation to the auditory signal and its informational content.

KEYWORDS:

Audiovisual speech; Classification image; McGurk; Multisensory integration; Prediction; Speech kinematics; Timing

PMID:
26669309
PMCID:
PMC4744562
DOI:
10.3758/s13414-015-1026-y
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Springer Icon for PubMed Central
Loading ...
Support Center