Send to

Choose Destination
Cognition. 2011 Aug;120(2):268-91. doi: 10.1016/j.cognition.2011.05.005. Epub 2011 Jun 12.

Investigating joint attention mechanisms through spoken human-robot interaction.

Author information

Department of Computational Linguistics Campus, Saarland University, 66123 Saarbruecken, Germany.


Referential gaze during situated language production and comprehension is tightly coupled with the unfolding speech stream (Griffin, 2001; Meyer, Sleiderink, & Levelt, 1998; Tanenhaus, Spivey-Knowlton, Eberhard, & Sedivy, 1995). In a shared environment, utterance comprehension may further be facilitated when the listener can exploit the speaker's focus of (visual) attention to anticipate, ground, and disambiguate spoken references. To investigate the dynamics of such gaze-following and its influence on utterance comprehension in a controlled manner, we use a human-robot interaction setting. Specifically, we hypothesize that referential gaze is interpreted as a cue to the speaker's referential intentions which facilitates or disrupts reference resolution. Moreover, the use of a dynamic and yet extremely controlled gaze cue enables us to shed light on the simultaneous and incremental integration of the unfolding speech and gaze movement. We report evidence from two eye-tracking experiments in which participants saw videos of a robot looking at and describing objects in a scene. The results reveal a quantified benefit-disruption spectrum of gaze on utterance comprehension and, further, show that gaze is used, even during the initial movement phase, to restrict the spatial domain of potential referents. These findings more broadly suggest that people treat artificial agents similar to human agents and, thus, validate such a setting for further explorations of joint attention mechanisms.

[Indexed for MEDLINE]

Supplemental Content

Full text links

Icon for Elsevier Science
Loading ...
Support Center