Gaze following in Archosauria—Alligators and palaeognath birds suggest dinosaur origin of visual perspective taking
Associated Data
Abstract
Taking someone else’s visual perspective marks an evolutionary shift in the formation of advanced social cognition. It enables using others’ attention to discover otherwise hidden aspects of the surroundings and is foundational for human communication and understanding of others. Visual perspective taking has also been found in some other primates, a few songbirds, and some canids. However, despite its essential role for social cognition, visual perspective taking has only been fragmentedly studied in animals, leaving its evolution and origins uncharted. To begin to narrow this knowledge gap, we investigated extant archosaurs by comparing the neurocognitively least derived extant birds—palaeognaths—with the closest living relatives of birds, the crocodylians. In a gaze following paradigm, we showed that palaeognaths engage in visual perspective taking and grasp the referentiality of gazes, while crocodylians do not. This suggests that visual perspective taking originated in early birds or nonavian dinosaurs—likely earlier than in mammals.
INTRODUCTION
The advent of visual perspective taking represents a key event in the evolution of social cognition. It marks the transition from a unidirectional to a multidirectional frame of reference in social situations, providing information about the world that would otherwise remain out of reach and offering new beneficial ways of navigating social environments. Among other things, perspective taking lays the foundation for the so-called referential communication, where one refers to a jointly perceived object or event. It also forms the bedrock for ascribing beliefs and mental states to other individuals. However, the most basic form of perspective taking, upon which further skills rely, is the generalization from an egocentric to an allocentric visual viewpoint. Put simply: Appreciating that someone else can see what you cannot and consequently being able to recognize what the other one is attending to. This ability can be identified in the ways humans and other animals follow the gazes of others. Visual perspective taking is revealed in the most advanced form of gaze following, where the gaze target of the other is blocked from the onlooker’s view, causing the onlooker to reposition itself to see what the other is seeing. The ability to take someone else’s visual perspective in this way is known as geometrical gaze following (1).
Despite its foundational role in social cognition, studies on visual perspective taking have largely lacked a phylogenetic focus, leaving a patchy understanding of cognitive evolution in general. To date, geometrical gaze following has only been found in apes, monkeys, wolves (and dogs), corvids, and starlings (2–6)—diverse lineages that all have arisen after the end-Cretaceous extinction, a period witnessing extensive neurocognitive evolution (7). Hence, we are currently uninformed about one of the major transitions in social cognition. Considering the growing evidence that mammals and birds—separated by 325 million years—have evolved similar cognitive repertoires independently (8) and the fact that geometrical gaze following has only been found in few mammalian and avian species, there are good reasons to assume that visual perspective taking has arisen separately multiple times. It is essential to study each lineage in deep time to better understand the principles of sociocognitive evolution. These studies, in combination with research on brain evolution, may shed light on the timing, selective pressures, and possible relaxations of evolutionary constraints.
To begin establishing when visual perspective taking arose in Sauropsida (the lineage including reptiles and birds but not mammals), we used the paleontological inference method of extant phylogenetic bracketing (9). By comparing the gaze following repertoire of crocodylians with that of palaeognath birds, we phylogenetically bracketed the dinosaur lineage leading to birds as closely as possible. Crocodylians are the closest living relatives of birds. They have had slow evolutionary rates (10) and seem to have largely retained an ancestral brain morphology (11). Palaeognath birds, on the other hand, are the most neurocognitively plesiomorphic extant birds, making them in this regard more similar to nonavian paravian dinosaurs than any other bird taxa (7, 12).
The study of gaze following has its roots in developmental psychology and comprises an extensive research program, which has been successfully adopted by animal researchers. Early on, gaze following was divided into two qualitatively different levels, a high and a low level (13). The high level affords the aforementioned geometrical gaze following, while low level gaze following is an almost reflexive co-orientation with the visual direction of the other individual (14). The low level does not require prior expectations to find anything in the gaze direction or representations of the referentiality of the gaze but is an adaptive reaction that leads to noticing objects or events that could otherwise have been missed. This gaze following is mediated by conserved subcortical structures (15, 16). Low-level gaze following is commonly tested through gaze following into the distance experiments, where a demonstrator is lured to gaze either up or to the side. An onlooker capable of this skill is expected to co-orient with the gaze direction of the demonstrator. Low-level gaze following develops far earlier in children than high-level gaze following, with an onset between 3 and 6 months of age (17, 18). Gaze following into the distance has so far been found in all studied amniotes, ranging from mammals to birds and reptiles (19–21).
As mentioned, high-level gaze following, on the other hand, is a notably more advanced form. It presupposes expectations of finding something in the other’s line of gaze, and that this gaze reference can only be found if one changes one’s own perspective. This is the reason it is tested in the geometrical gaze following paradigm, with barriers blocking the view, that must be circumvented. As expected, this form of gaze following is suggested to be mediated by various cortical areas (22); although the avian homologues for this gaze following still need to be determined. In children, high-level gaze following, i.e., geometrical gaze following, is not seen until the age of 18 months (17).
Another central gaze following behavior that thus far has only been reported in humans, apes, and Old World monkeys (2, 23, 24) is the so-called “checking-back” behavior. This behavior is instigated when no object of interest is identified in the other’s line of gaze, or when the gaze direction and its target are incongruent. The observer will then look back at the other in an apparent attempt of retracking the gaze direction. The checking-back behavior is regarded as an essential diagnostic behavior for the onlooker’s representation of the referentiality of the other’s gaze, i.e., that it is pointing toward something (25). Checking-back thereby reveals a violation of the expectancy to find a gaze target in the observed gaze direction.
Visual perspective taking, as displayed in geometrical gaze following, does not imply the representation of others’ epistemic or perceptual states. Rather, it is a form of functional representation, leading to behaviors that correspond to the fact that the other has a different perspective and that its gaze refers to an object.
Furthermore, visual perspective taking is traditionally divided into a level I and II (26). Level I enables taking into account what (or that “something”) lies in the line of gaze of the other or, in other words, what the other can or cannot perceive. In children, this level develops between 18 and 24 months (27–29). Level II, on the other hand, requires the adoption of the spatial viewpoint of the other and hence taking into account how the world is perceived from that perspective. One understands that the same thing oneself sees is perceived differently from the angle of the other. This is considerably more advanced and does not develop in children until the age of 3 years (30). It has been suggested that while geometrical gaze following cannot reveal level II perspective taking, it forms the embodied precursor to developing or evolving it. Repositioning the body provides an experience of the other’s perspective, which, in turn, can be used in mental simulations of one’s own body positions to understand others (31). Together, geometrical gaze following is a sophisticated embodied sensory-motor process that anchors the most advanced forms of social cognition.
To investigate potential level I visual perspective taking skills in extant archosaurs, which phylogenetically bracket the extinct Dinosauria, we tested 30 individuals from five archosaur species (six per species) for their ability to follow conspecific gaze: emus (Dromaius novaehollandiae), greater rheas (Rhea americana), elegant-crested tinamous (Eudromia elegans), red junglefowl (Gallus gallus), and American alligators (Alligator mississippiensis). The three palaeognath species represent different phylogenetic nodes within that group and different socioecologies, as well as flightlessness and volant flight (32). The red junglefowl were added as an outgroup of plesiomorphic neognaths, belonging to the lineage Galloanserae that diverged from Neoaves (the other large group of neognaths) before the end-Cretaceous extinction. The animals were tested in three gaze following experiments: following gaze into the distance up and to the side and geometrically behind a barrier (for experimental setups, see Fig. 1). The potential presence of checking-back behavior was studied in all three experiments.
Panels depict experiment setups (from left to right) for alligators, small birds (red junglefowl and elegant-crested tinamous), and large birds (emus and rheas). (A) Setups for experiment 1 (gazing up). (B) Setups for experiment 2 (gazing to the side). (C) Setups for experiment 3 (geometrical). Red dots depict stimuli used to lure demonstrators’ gazes (for more information about stimuli, see Materials and Methods).
RESULTS
Overview of experimental setups
Each experiment had two conditions: a demonstrator condition, including a subject and a demonstrator, and a no-demonstrator condition with only the subject present. Both conditions consisted of two trial types: stimulus and no-stimulus.
In stimulus trials of the demonstrator condition, a stimulus lured the gaze of a demonstrator toward a specific location depending on the experiment: up, to the side, or behind a barrier (for more details on stimuli used, see Materials and Methods). This was the main test and measured co-orientation of the subject’s gaze with that of the demonstrator.
In no-stimulus trials of the demonstrator condition, the demonstrator was present, but no stimulus was shown, i.e., the demonstrator was not giving a gaze cue. This controlled for any potential gazing behaviors of the subject, because of the mere presence of a conspecific that could influence the interpretation of the gaze following test.
The no-demonstrator condition was conducted in the same way as the demonstrator condition, but with no demonstrator present. The stimulus trials were compared to the no-stimulus trials to control whether the presented stimulus could be detected by the subject.
Gaze following into the distance and geometrical gaze following
All tested species followed conspecific gazes into the distance, i.e., they looked significantly more often in the same direction as the demonstrator, compared to trials where the demonstrator did not give a gaze cue. This was not the case in the no-demonstrator condition, where subjects did not look significantly more often toward the location where the stimulus was shown compared to no-stimulus trials (see figs. S1 and S2). In experiment 1 (gazing up), all birds performed at a comparable level (see Fig. 2). However, the alligators did not respond by looking up but instead turned around and looked behind themselves at a significant level (see Fig. 2).
Probability of looking up (turning around in alligators) in demonstrator condition of experiment 1. All bird species looked up significantly more often in trials with a stimulus present (a demonstrator gazing up) compared to trials with no stimulus (likelihood ratio test, χ2 > 4.55, df = 1, P < 0.033). Alligators reacted by turning around and looking behind themselves. They did so significantly more often in trials with a stimulus present (likelihood ratio test, χ2 = 5.77, df = 1, P = 0.016). EC Tinamou, elegant-crested tinamou. *P < 0.05, **P < 0.01, and ***P < 0.001.
In experiment 2 (gazing to the side), all birds passed the test at similar rates (see Fig. 3). The alligators also passed experiment 2 but with a notably lower rate than any bird species (see Fig. 3).
Probability of turning to correct side in demonstrator condition of experiment 2. All bird species turned more to the correct side in trials with a stimulus present (likelihood ratio test, χ2 > 3.88, df = 1, P < 0.049). No significant difference in gaze following rate between bird species was found. Alligators followed gaze at significantly lower rates compared to birds (likelihood ratio test, χ2 = 15.055, df = 4, P = 0.0046). *P < 0.05 and ***P < 0.001.
There is a clear difference in the frequency of gaze follows into the distance (experiments 1 and 2) between alligators and birds (see Fig. 4), even when regarding the turning around behavior by the alligators in experiment 1 as a gaze following response. There is no significant difference between the different bird species.
Species had a significant effect on probability of gaze following (likelihood ratio test, χ2 = 26.407, df = 4, P < 0.001). Gaze following proportions were significantly higher for birds [elegant-crested (EC) tinamou: 0.68; emu: 0.67; rhea: 0.63; and red junglefowl: 0.72] compared to alligators (0.24). **P < 0.01 and ***P < 0.001.
All bird species followed gaze geometrically and at comparable rates (see Fig. 5). They looked significantly more often behind a barrier in trials where a demonstrator was gazing toward the location compared to trials without a demonstrator’s gaze. No difference in the proportion of looking behind the correct barrier was found between stimulus and no-stimulus trials of the no-demonstrator condition (see fig. S3). The alligators, however, did not reveal any geometrical gaze following in the test.
Probability of moving around correct barrier in demonstrator condition of bird species (experiment 3). Alligators did not follow gaze geometrically. Between birds, no significant effect of species was found, but there is a trend for a higher proportion in emus (Z = 1.93, P = 0.054). All bird species moved around the correct barrier significantly more often in trials with a stimulus compared to trials without a stimulus (likelihood ratio test, χ2 = 33.74, df = 1, P < 0.001). *P < 0.05 and ***P < 0.001.
Checking-back behavior
All bird species engaged in checking-back behavior, but the alligators did not. There was a significant species effect among the birds on the probability of checking-back in experiment 3, the geometrical gaze following (likelihood ratio test: χ2 = 9.73, df = 3, P = 0.021). This difference, however, is likely caused by differences in the experimental setups because of varying body sizes. While the larger birds (emus and rheas) could effortlessly check back by lifting their head, the smaller birds (junglefowl and tinamous) had to walk out from behind the barrier to be able to see the demonstrator again, as they were too small to look over it. This is likely the reason why the larger birds were found to check back more often in the geometrical experiment, while in the other two experiments, all birds checked back at comparable rates. This is evident from the fact that this difference was found after relocation, i.e., after birds had looked behind the barrier.
Checking-back behavior will lead to a renewed gaze follow toward the target if the demonstrator is still looking at it. In 27% of the checking-back instances, the observer again followed the gaze toward the target. In these instances, the demonstrators’ gazes lasted, on average, 30% (0.73 s) longer, indicating that the demonstrator was still gazing toward the target.
DISCUSSION
Visual perspective taking has, to our knowledge, not been previously studied in palaeognath birds and crocodylians. The palaeognath birds and the junglefowl show a gaze following repertoire on par with apes and some Old World monkeys, including behaviors diagnostic of the expectation of a gaze reference. The alligators’ performance is mostly similar to other non-avian reptiles and appears to be restricted to the low-level form of gaze following into the distance; a skill that seems to be shared by all amniotes. Collectively, this suggests that in Sauropsida, visual perspective taking along with representations of the referentiality of gazes, originated somewhere within Dinosauria. It is likely that these skills arose far earlier in this lineage than in Mammalia. To date, visual perspective taking in mammals has only been found in some primates and canids, which are members of two separate groups that emerged independently roughly 60 million years later than the latest probable appearance of visual perspective taking in Dinosauria (see Fig. 6). This difference in timing might be explained by differences in the visual system, as well as in neuroanatomy.
A simplified phylogeny, with varying resolutions of different taxa, showing the likely origins of visual perspective taking in the lineages of Dinosauria and Mammalia. The thick line in the shaded red areas represents the latest likely appearance of the skill based on experimental evidence. The shading stretching backward in time represents possible earlier presence of visual perspective taking. It is much likely that there is a common origin in the dinosaur lineage, as visual perspective taking exists in several phylogenetically distant bird groups. There are a number of neuroanatomical and likely sensorical overlaps between the palaeognaths and the closely related non-avian paravian dinosaurs, making a non-avian origin plausible. The origin(s) in the mammalian lineage might, on the other hand, be convergent. Visual perspective taking has been found in simians and canids, which arose roughly around the same time. However, their ancestors diverged from each other far earlier, in a split that also resulted in many other major mammalian groups (not depicted in the figure). A common origin would imply that the skill is widespread among placental mammals, but this evidence is currently lacking. Note, however, that all taxa in Sauropsida and Synapsida probably share the skill of gaze following into the distance. The figure has its highest taxonomic resolution in relation to the palaeognaths and crocodilians. All extant orders of palaeognaths and extant superfamilies of crocodilians are represented. The taxonomic groups tested in the current study are marked by black silhouettes and bold text. The dotted line represents the end-Cretaceous extinction (the K/Pg boundary). †, extinct clade; Ma, million years ago. [The dating and phylogenetic relationships of the crocodilians, birds, and mammals are respectively derived from (73), (32), and (66)].
Apart from the current study, only one reptile species—the central bearded dragon (Pogona vitticeps)—has been tested for geometrical gaze following (33). Same as alligators, they did not exhibit these high-level gaze following skills. However, all previously studied reptiles follow gaze into the distance (21, 33, 34). This indicates that low-level gaze following skills are a shared attribute among reptiles, but that visual perspective taking might be absent, suggesting a comparable repertoire in ancestral archosaurs.
However, the alligators do not follow gazes upward, but instead turn around. This contrasts with all tested terrestrial non-avian reptiles, which co-orient with upward gazes (21, 33, 34). It may reflect crocodylians’ adaptation to a life at the water surface, which is apparent in the horizontal arrangement of their sensory organs and retinal ganglion cells in the eye (35). Perhaps they mainly raise the head to see further ahead over the surface, rather than up, which would then be at a location behind and not above the observer. Turning around would then entail gaze following outside one’s own field of vision, which is a form of geometrical gaze following. Another interpretation is that turning around is an appeasing response, as snout lifting is a submissive signal (36). However, such a response has never been reported nor observed by us in any other situation. The turning around is likely a response to gaze, but, as alligators show no geometrical gaze following in experiment 3, it could be a taxon-specific response because of its potential adaptive importance at the water surface, or it could represent an evolutionary early form of geometrical gaze following.
That geometrical gaze following was shown by all bird species in our study, indicates that it should be within the repertoire of all birds (unless lost secondarily), given that the species studied represent some of the neurocognitively most conserved taxa. Previously, geometrical gaze following in birds has only been identified in two corvid species, common ravens (Corvus corax) and rooks (Corvus frugilegus) (5, 37), and in one other songbird, the European starling (Sturnus vulgaris) (6). On the other hand, only one other study on birds has investigated geometrical gaze following. A study on the Northern bald ibis (Geronticus eremita) did not find this gaze following, which would counter our prediction or represent a secondary loss (38). However, the results probably reflect methodological limitations. Among other things and in contrast to most studies (including the current one), the ibis were not facing each other in the geometrical condition but stood next to one another, which might have distorted the observer’s prediction of the demonstrator’s visual perspective. The authors themselves also cautioned against the results and suggested tests with different methods. The best prediction is still that most birds, from all taxa, have this seemingly conserved ability.
Checking-back behavior, which was found in all birds, has not been reported outside apes and Old World monkeys. However, our findings suggest that checking-back is a more widespread behavior than previously thought. It has simply never been described or looked for in other species. The only negative results on checking-back stem from two species of New World monkey: black-handed spider monkeys (Ateles geoffroyi) and tufted capuchin monkeys (Cebus apella) (39).
Arguably, checking-back behaviors should be within the repertoire of species capable of geometrical gaze following; hence, gaze following presupposes the expectation that the other’s gaze is directed at something, which cannot currently be seen. Checking-back is a behavior signifying such an expectation. The behavior develops earlier in children than the ability to follow gaze geometrically—8 months versus 18 months (17, 40). Indicating that the ability to expect a reference of the gaze not only precedes but is also a prerequisite for visual perspective taking. The negative results in the study on New World monkeys may be experimental artifacts, something the authors also suggested. One individual spider monkey in the study was found checking back multiple times.
As mentioned, alligators and birds differed in that the alligators did not reveal visual perspective taking (barring the curious turning around behavior) or any checking-back behavior. However, they also differed in another important measure: the sensitivity to the other’s gaze, which is seen in the proportions of gaze follows in experiments 1 and 2 (Fig. 4). The birds, on the other hand, had a proportion of gaze follows similar to great apes (41).
The potential role of differences in cerebellar size
Socioecological factors seem to fail to explain the differences between taxa, as all species tested are social and live in groups of various sizes and stabilities, shifting notably even within species (42–44). Rather, the best explanation appears to be related to neuroanatomy.
A major neuroanatomical difference between crocodylians and avians is the radically higher density of neurons in birds, leading to much greater neuron numbers in their brains. The main proportional increase of neurons in the evolution from stem archosaurs to birds is found in the cerebellum (7). For example, an emu has 20.5 times as many neurons in the cerebellum as a Nile crocodile (Crocodylus niloticus) while having 15.75 times as many neurons in the telencephalon. We suggest that the vastly expanded cerebellum provides insights into why birds, but not crocodylians (or other reptiles), show visual perspective taking with its accompanying representations of gaze reference. While a large cerebellum is not sufficient for advanced gaze following, it is most likely a prerequisite.
The cerebellum primarily guides motor control but is involved in a variety of cognitive processes (45). This structure is organized in parallel loops through which it simultaneously receives input and sends projections to cortical areas (46). The highly regular cytoarchitecture suggests a unified mechanism underlying its various functions (47). An influential theoretical framework proposed for this unifying mechanism is that of the so-called internal forward models.
These models are top-down processes using prior, instead of immediate, information to guide behavior and to predict behaviors of others (48–51). Well-developed sensory-motor predictions allow rapid appropriate actions and update quickly when the model does not match the world. This considerably speeds up behavior, as compared to a system that instead continuously responds only to the feedback from the external world (bottom-up).
We propose that the differences in the gaze following repertoires of alligators and birds are partly explained by more robust internal forward models in birds. Gaze following is mediated by top-down processes in various action predictions of others (52, 53). The act of gazing can induce the prediction in the observer that the other’s gaze points to “something.” The checking-back behavior shows not only when these expectations are violated but also that the system is tuned to updating, which is a hallmark of internal forward models (51). The evolution and development of visual perspective taking and representing referentiality are likely an embodied process starting out from building sensory-motor forward models of one’s own behavior, which gets extended to other’s basic behaviors (54). Obviously, more robust internal forward models in the cerebellum, making more detailed and fine-grained predictions, will only arise in the presence of well-developed sensory-motor areas in the pallium (or cortex) that they project on, which are something birds have as compared to reptiles. However, we do not know to what extent the enlargement of the cerebellum seen in the maniraptoran theropods (55) reflects the existence of other brain areas involved in visual perspective taking.
The origins of visual perspective taking and further research
Palaeognaths are the best available extant neurocognitive models of non-avian—but closely related—paravian dinosaurs, such as dromaeosaurids and troodontids. There are, of course, differences between the least-derived (extant) avian brain and that of extinct non-avian paravians. For example, the presence of the Wulst (hyperpallium) and ventrally deflected optical lobes in birds (55), which likely mainly represent adaptations to the visual-motor requirements of flight. Nevertheless, the palaeognath brain is notably more similar to a non-avian paravian brain than to that of a crocodylian, not only in size, shape, and proportions between areas (7, 12, 56) but also in the relationship between body and brain size, where palaeognaths fall within the scaling relationship of non-avian paravians, unlike most other birds (57). One of the central questions, however, is whether the neuronal density was similar between paravians and palaeognaths, because the number of neurons is currently one of the best neurobiological correlates to cognitive performance (58, 59). Palaeognaths have the least derived scaling relationship of neuronal numbers among birds (shared with some neognath taxa) but that still allows about twice as many neurons per volume unit than in a nonprimate mammal (7, 12). It has recently been shown that endothermy is highly associated with the extreme increase of neuron numbers (7). Accumulating evidence from different methodological sources suggests endothermy in at least non-avian paravians (60–62). There are hence reasons to assume that these dinosaurs had neuronal densities more similar to palaeognaths than to extant reptiles.
Despite the lack of studies on structures in the avian brain corresponding to those in mammals that mediate geometrical gaze following, it may be the case that they existed in non-avian paravians too, given several similarities to palaeognaths. If so, visual perspective taking could have arisen in the non-avian paravians (or perhaps earlier) and may thus have been present by the Middle Jurassic (ca. 174 to 163 million years ago). However, if the unique avian Wulst, which is an area of visual and somatosensory integration (63, 64), proves central for visual perspective taking, then one would expect that its origin occurred later. There is still no consensus, based on the fossil record, when the Wulst appeared, but because it exists in both palaeognaths and neognaths, which according to molecular analyses diverged in the Early Cretaceous (about 110 million years ago) (32), it should at least have been present then. There indeed exist projections from the Wulst to the cerebellum (65). However, the visual and somatosensory requirements of flight likely exceed those of terrestrial mammals and might therefore represent levels of sensory-motor models beyond what is needed for modeling other’s occluded lines of gaze. Only further research on brain anatomy and brain function in birds, as well as on brain anatomy in extinct dinosaurs (including avians), will help to better pinpoint the origins of visual perspective taking in dinosaurs.
However, as mentioned earlier, the current evidence from mammalian geometrical gaze following places the evolution of this attribute in lineages that diverged after the end-Cretaceous extinction: Simiiformes (monkeys and apes) and Canidae (where it is only shown in wolves and dogs). That puts the origin of visual perspective taking considerably later in mammals than in birds—with about 60 million years. However, if it was not convergently evolved in simians and canids, it should be found in many taxa that diverged since the split of the common ancestor of Primates and Carnivora, ranging from rodents to bats and a long range of others, and its origins would then be traced well before the end-Cretaceous extinction (66) but still roughly 40 millions of years after its origins in dinosaurs (see Fig. 6). More gaze following studies on mammals are needed to provide better understanding and to disentangle to what extent these skills evolve convergently within mammals.
It is not surprising if visual perspective taking, with accompanying representations of gazes’ referentiality, evolved earlier in dinosaurs than in mammals. The major increase of neurons, which is seen in both mammals and birds—likely as a response to endothermy—might be a prerequisite, but acute vision may be of additional importance. The benefits of gaze following are likely enhanced by an advanced visual system, where foveae and color vision seem particularly useful, both of which most likely existed already in non-avian dinosaurs as it exists in reptiles and birds (except where lost due to nocturnal adaptations). Following the gaze of someone who can attend to more details in the environment, as well as see further into it, provides more information, given that the gaze follower itself has similar visual capabilities. Mammals were initially and, for a very long time (and a majority still is), primarily nocturnal, and vision had less utility than, e.g., olfaction (67). The most well-developed gaze following repertoires in mammals are found in simians and, particularly, apes. Primates have readapted their vision to diurnal conditions and regained both foveae and color vision. The refinement of the visual system coevolved with the relative expansion of the primate cerebellum (68), which proportionally increased even more in great apes (69). Arguably, this expansion led to improved visual-motor internal forward models for prediction of other’s behaviors, perhaps making apes similar to birds in this regard. Studies on other mammals are needed to understand the role of visual acuity for visual perspective taking and whether differences may lead to convergent evolution of this skill within mammals. However, in addition, more studies are needed investigating to what degree other sensory modalities aid in various forms of perspective taking.
Geometrical gaze following reveals only the basic forms of visual perspective taking (level I) and cannot attest to more advanced sociocognitive skills. Decades of research into animal cognition have focused on various aspects of “mindreading” abilities. Animals’ mental perspective taking, such as representations of others’ epistemic states, intentions, desires, or other motivational states, has been intensely studied, where apes and corvids show the highest proficiency (70). However, much more research is needed on neurocognitively plesiomorphic animals to better understand the evolution of social cognition.
MATERIALS AND METHODS
Ethical statement
All animals participated on a voluntary basis and could at any time leave the setup, which was installed in their home enclosure. Food rewards were offered to the animals to approach the starting position, and no force was used. The animals were housed at zoos and with private owners responsible for their welfare. The owners gave their permission, based on the study protocol, to conduct the research as it adhered to animal welfare regulations. The research did not include procedures under the European Union (EU) Directive 2010/63/EU and did not qualify for any further ethical approval than the permission given by the owners. This is also consistent with the stricter Swedish legislation (SJVFS 2019:9, chapter 2, § 22).
Experimental design
We tested 30 subjects of five archosaur species (six per species; for more information on subjects, see the Supplementary Materials) for their ability to follow gazes of conspecifics in three experiments. Testing took place between January 2019 and November 2020. Experiments 1 and 2 tested for gaze following into the distance upward (experiment 1) and to the side (experiment 2). Experiment 3 investigated geometrical gaze following, i.e., tracking gaze around a barrier. All subjects finished one experiment before moving on to the next. Because session lengths and demonstrator training varied drastically between species (especially between birds and alligators), the period between experiments differed between species (from 1 day to 4 months).
Because of limited sample sizes, some individuals of each species were used as both demonstrator and subject. Those individuals first finished all demonstrations before serving as subjects to minimize the number of potentially biased trials. Demonstrators were selected on the basis of the highest responsiveness to gazing stimuli (described below). Most subjects experienced the same demonstrator across all three experiments. Only for the alligators and two rheas demonstrators had to be switched as they stopped responding to the gazing stimuli.
Because of the physical differences of the tested species, three different experimental setups were used within each experiment to create optimal testing conditions. Alligators, large birds (emus and rheas), and small birds (elegant-crested tinamous and red junglefowl) received their own setups, respectively (see Fig. 1).
A gazing stimulus was used to evoke gazing responses of demonstrators. Demonstrator birds from all bird species, besides red junglefowl, spontaneously reacted by looking toward the red dot of a laser pointer. The demonstrators among the red junglefowl were conditioned to turn toward the dot of a laser pointer in training sessions before the experiments. The demonstrators among the alligators were conditioned to turn toward a blue rubber ball. Conditioning was achieved through clicker training in both species. However, no clicker was used during testing. Ahead of each session, we conducted three reminder trials where the clicker was used to ensure a correct reaction of the demonstrators during the experimental session.
Every experiment was divided into two conditions: demonstrator and no demonstrator. In the demonstrator condition, subject and demonstrator were present, while only the subject was present in the no-demonstrator condition. Half of the subjects per species started with the demonstrator condition, the other half started with the no-demonstrator condition. Each condition was further divided into two trial types: stimulus and no stimulus. In stimulus trials, the gazing stimulus was presented, whereas no stimulus was shown in no-stimulus trials. Trial types were pseudorandomized.
Every condition (demonstrator or no demonstrator) consisted of 12 trials, 6 of each trial type. The two conditions were completed in separate sessions with a break of at least 15 min between sessions for birds and 1 hour for alligators. Both conditions of one experiment were completed within the same day when possible. Sessions were only aborted when subjects did not approach the experimental setup in three consecutive trials, and a break of at least 15 min commenced. If the subjects were still uninterested in approaching after the break, then the remaining trials were completed on the next testing day.
In stimulus trials of the demonstrator condition, the stimulus was presented until a gazing response of the demonstrator was evoked. In no-stimulus trials of the demonstrator condition, no stimulus was presented so that the demonstrator was simply present. These trials controlled whether the mere presence of a conspecific altered the gazing behavior of subjects.
In stimulus trials of the no-demonstrator condition, only the subject was present while the stimulus was presented for 5 s. This served to control if the stimulus was visible from the subject’s side. In no-stimulus trials of the no-demonstrator condition, no stimulus was shown while only the subject was present. This was done to maintain the same procedure and session length as in the demonstrator condition.
In both conditions, trials lasted for 10 s after demonstration (either the demonstrator gazing or the stimulus being presented without demonstrator present). Only in experiment 3, alligators were given 1 min due to the potential amount of walking in this setup. A trial was valid if the subjects were facing the experimental setup during stimulus presentation or for 3 s in no-stimulus trials, in both the demonstrator and no-demonstrator condition. If subjects turned away before stimulus presentation or looked at the ground (in the case of birds), then the trial was repeated.
If a significant difference in orienting responses could be identified between stimulus and no-stimulus trials in the demonstrator, but not the no-demonstrator condition, this difference was most likely caused by the gaze cue of the demonstrator. All trials were video-recorded, with one camera behind the subject and one facing the subject to ensure optimal angles of the heads and eyes of subjects.
Experimental setups
The general procedure of all three experiments was the same for all species. Demonstrator and subject were separated by a mesh divider. The demonstrator’s compartment stayed empty in the no-demonstrator condition. At the beginning of each trial, both demonstrator and subject (or only the subject in the no-demonstrator condition) were lured toward the divider through a food reward placed 1 m in front of it in a central location. When both animals had picked up their food and were thus in a central and straight position in front of the divider, the trial started. At the end of each trial, the birds were lured away from the divider with food, while the alligators received their food reward in the location they were lying (due to their slow movement speed). Once both demonstrator and subject (or only the subject in the no-demonstrator condition) had picked up their food reward, the same procedure as described above was applied to achieve an appropriate starting position for the next trial.
Experiment 1: Up
In experiment 1, an opaque screen was mounted on top of the divider that was placed between subject and demonstrator. For alligators, the blue rubber ball that demonstrators were conditioned to turn toward could be lowered into view with a string from an opaque tube attached to this screen on the side facing the demonstrator. For all bird species, the dot of a laser pointer was projected onto the screen on the demonstrator side.
Experiment 2: Side
For alligators, two experimenters seated behind 60-cm-high wooden barriers on either side of the demonstrator each had a blue rubber ball mounted on a wooden stick. In stimulus trials, one experimenter presented the ball through a cutout in the wooden barrier they were seated behind. A small wooden barrier in front of this cutout prevented the subject from seeing the ball. Sides were counterbalanced; each subject received the same number of trials on either side. A sponge underneath the cutout ensured that no sounds were made when lowering the balls after presentation.
For small birds, two wooden barriers were placed on the demonstrator side on which the dot of the laser pointer could be presented toward the demonstrator. Large demonstrator birds (emus and rheas) quickly habituated to the dot of the laser pointer. For that reason, in experiments 2 and 3, their gazes were lured by showing food. Because of structural differences in the enclosures, this was done in two different ways. For emus, two tall wooden boards were propped up on both ends of the mesh divider on the demonstrator side. Two experimenters stood behind these boards. Each of them held a grape on a stick, which could be shown in a cutout to lure the gaze of the demonstrator (similar to the alligator setup). For rheas, two wooden boards were hung from poles on each end of the mesh divider. On the side facing the demonstrator, an opaque tube was attached to both boards from which grapes could be lowered into view with a string.
Experiment 3: Geometrical
Alligators were exposed to the same setup as in experiment 2 (side). However, this time, wooden barriers were placed 1 m in front of the mesh barrier on the subject’s side. The stick with the target ball was in this condition not only shown in the cutouts but stuck out of them to make the demonstrator gaze behind one of the two barriers on the subject’s side. In the presence of geometrical gaze following, the subject would have to walk up to the barriers and turn around the indicated one. The barriers were slightly angled, which prevented the subject from seeing behind both barriers simultaneously when placing itself between them.
For small birds, two barriers were placed on the subject side. The dot of the laser pointer was directed to the back of the barrier so that it was only visible to the demonstrator. In this way, an orientation of the demonstrator toward that barrier looked to the subject as if the demonstrator was looking behind that barrier.
For large birds, a wooden barrier was placed between demonstrator and subject. On the demonstrator side, a contraption was installed on ground level from which a grape could be shown by pulling it out from an opaque tube with a string. By showing the grape, the gaze of the demonstrator was lured toward the ground behind the barrier. A successful subject would be expected to lean over the barrier to identify the gaze target. The experimental setups for all three experiments are depicted in Fig. 1.
Coding definitions
All videos were coded using Solomon Coder (71). When coding trials of all three experiments, we coded “target location” and "checking-back". Target location had different definitions depending on the experiment but generally referred to the location where the gazing stimulus was shown. In experiment 1, the target location was the panel above the divider. We coded target location every time a subject looked up toward that panel. For alligators, we additionally coded “turning around,” which was defined as the subject turning more than 90° away from its initial position. In experiment 2, the target location was the side where the stimulus was shown or the side the demonstrator looked toward. In no-stimulus trials, we predetermined “correct” sides randomly and coded target location if the subject turned toward that side. We only scored first orientations of subjects in this experiment. The same method was applied to experiment 3 of the small birds and alligators. Experiment 3 of the large birds did not include sides but only had one target location: the ground behind the barrier. Target location was only coded when subjects relocated themselves around barriers (or looked over the barrier in large birds) and not when they looked toward that location. In addition, we coded the latency of target location for each experiment, either from trial onset in no-stimulus trials or from the onset of the stimulus (the gazing stimulus in the no-demonstrator condition or the gaze of a demonstrator in the demonstrator condition). We coded checking-back when a subject looked toward the target location and then back at the demonstrator. We moreover recorded whether the subject looked to the target location again after checking back. Ten percent of the videos—including all species and experiments—were coded for interobserver reliability, and intraclass correlation (ICC) was good (ICC = 0.85, F = 12.6, P < 0.001).
Statistical analysis
The data were analyzed with generalized linear mixed models in RStudio (version 1.4.1717) (72). For every experiment, a model for each of the two conditions (demonstrator and no demonstrator) was created. The models were fitted with a binomial distribution, and the individual identity of the observer was added as a random factor with session nested within to control for an individual’s current motivational state. Head movements toward a target served as the response variable; species and trial types (gazing stimulus present/not present), as well as their two-way interaction, were fixed factors. We reduced these full models stepwise to find the best fitting model using the Akaike information criterion (AIC). If the AIC of a model was more than 2 points lower after exclusion of a factor compared to the model including this factor, it was subsequently dropped from the model. AICs of models were compared using the drop1 function. This was done for both fixed factors and interactions. All interactions and fixed factors remaining in the models were hence explaining variance. The final models acquired in this way, were subjected to likelihood ratio tests to assess the effect of remaining factors (for values of final models, see the Supplementary Materials). If trial type with the gazing stimulus present had no significant effect in the no-demonstrator condition but a significant effect in demonstrator conditions, then this was interpreted as gaze following. Subsequently, we ran the same models for each experiment but used checking-back as the response variable.
We started the statistical analyses with a model including all species and experiments as well as their interactions. This revealed a significant effect of both species and experiment. Therefore, each experiment was subsequently analyzed individually. Moreover, pairwise comparisons within the model (Tukey’s test) revealed a significant difference between the alligators and all the bird species. Consequently, the alligators were thereafter analyzed separately from the birds. To obtain precise numbers for all species, each model was subsequently run separately for each bird species.
Acknowledgments
We thank N. Thierij, M. Kernkamp, M. Andersson, M. Luce, S. Grendeus, T. R. Jensen, O. Oscarsson, and I. Jacobs for the practical help. We are grateful to H. Osvath for the creation of figures. We thank S. Brusatte, P. Němec, and L. Witmer for helpful comments on the manuscript. We are also grateful to Y. Zoo and F. Zoo.
Funding: This work was supported by the Swedish Research Council grants 2019-03265, 2021-02973, and 2021-01650 and the LMK Foundation.
Author contributions: Conceptualization: M.O. Design: S.A.R. and C.Z. Data collection: C.Z. and S.A.R. Video coding: C.Z. Statistical analyses: C.Z. and S.A.R. Interpretation and writing: M.O., C.Z., and S.A.R.
Competing interests: The authors declare that they have no competing interests.
Data and materials availability: All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials.
Supplementary Materials
This PDF file includes:
Supplementary Text
Figs. S1 to S3
Tables S1 and S2
Legends for data S1 and S2
Other Supplementary Material for this manuscript includes the following:
Data S1 and S2

* 




