Talker change detection: A comparison of human and machine performance

Neeraj Kumar Sharma; Shobhana Ganesh; Sriram Ganapathy; Lori L Holt

doi:10.1121/1.5084044

Talker change detection: A comparison of human and machine performance

J Acoust Soc Am. 2019 Jan;145(1):131. doi: 10.1121/1.5084044.

Authors

Neeraj Kumar Sharma¹, Shobhana Ganesh², Sriram Ganapathy², Lori L Holt¹

Affiliations

¹ Department of Psychology, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, Pennsylvania 15213, USA.
² Department of Electrical Engineering, CV Raman Road, Indian Institute of Science, Bangalore 560012, India.

PMID: 30710945
DOI: 10.1121/1.5084044

Abstract

The automatic analysis of conversational audio remains difficult, in part, due to the presence of multiple talkers speaking in turns, often with significant intonation variations and overlapping speech. The majority of prior work on psychoacoustic speech analysis and system design has focused on single-talker speech or multi-talker speech with overlapping talkers (for example, the cocktail party effect). There has been much less focus on how listeners detect a change in talker or in probing the acoustic features significant in characterizing a talker's voice in conversational speech. This study examines human talker change detection (TCD) in multi-party speech utterances using a behavioral paradigm in which listeners indicate the moment of perceived talker change. Human reaction times in this task can be well-estimated by a model of the acoustic feature distance among speech segments before and after a change in talker, with estimation improving for models incorporating longer durations of speech prior to a talker change. Further, human performance is superior to several online and offline state-of-the-art machine TCD systems.

MeSH terms

Adult
Female
Humans
Male
Natural Language Processing*
Psychoacoustics
Speech Intelligibility
Speech Perception*
Speech Recognition Software / standards