pmc logo image
Logo of pnasPNAS Home page.Reference to the article.PNAS Info for AuthorsPNAS SubscriptionsPNAS About

Formats:

Proc Natl Acad Sci U S A. 2002 July 9; 99(14): 9596–9601.
Published online 2002 June 20. doi: 10.1073/pnas.092277599.
PMCID: PMC123186
Psychology
Rapid natural scene categorization in the near absence of attention
Fei Fei Li,* Rufin VanRullen, Christof Koch,* and Pietro Perona*
Divisions of *Engineering and Applied Sciences and Biology, California Institute of Technology, Pasadena, CA 91125
To whom reprint requests should be addressed. E-mail: feifeili/at/vision.caltech.edu.
Communicated by David Mumford, Brown University, Providence, RI
Received March 22, 2002; Accepted May 8, 2002.
What can we see when we do not pay attention? It is well known that we can be “blind” even to major aspects of natural scenes when we attend elsewhere. The only tasks that do not need attention appear to be carried out in the early stages of the visual system. Contrary to this common belief, we report that subjects can rapidly detect animals or vehicles in briefly presented novel natural scenes while simultaneously performing another attentionally demanding task. By comparison, they are unable to discriminate large T's from L's, or bisected two-color disks from their mirror images under the same conditions. We conclude that some visual tasks associated with “high-level” cortical areas may proceed in the near absence of attention.
Psychologists have long known that certain visual search tasks require minimal or no attention. A hallmark of preattentive vision is that it is achieved in a seemingly parallel fashion: a preattentive task may be carried out simultaneously with other visual tasks (1); target detection does not become significantly more difficult when the number of distractors is increased (2, 3). However, none of the known preattentive tasks approaches the sophistication of everyday vision, where complex scenes must be scrutinized to assess high-level properties such as the presence of danger and the structure of a social interaction. Virtually all of the visual tasks that may be performed preattentively have been explained, either in detail or in principle, by quasilinear models that replicate mechanisms found in the early stages of our visual system (4, 5). Although much can be accomplished by these simple mechanisms, it is quite clear that they are inadequate for explaining “high-level” perception such as recognition and categorization—i.e., visual processes that rely on neural activities in inferior temporal cortex and beyond (68). This would suggest that there is no sophisticated property of the scene that we can see without paying attention. In agreement with this view, change blindness and inattentional blindness studies demonstrate that without visual attention, significant changes in a large part of the visual field can easily escape our awareness (912).
On the other hand, some complex visual tasks can be rapidly accomplished by our visual system. RSVP (rapid serial visual presentation) experiments have demonstrated that natural objects belonging to a specified category may be classified remarkably fast (13, 14). Thorpe and colleagues (1519) have found that complex natural scenes can be categorized in as little as 150 ms. This astonishing speed relative to the time constant of information processing and transmission in networks of neurons raises the question of whether attention plays a critical role in this type of rapid visual processing. Our results indicate that there is little or no attentional cost in rapid visual categorization of complex, natural images.
Subjects.
Five right-handed subjects, including two authors (F.F.L. and R.V.R.), were tested in the main experiment. Four right-handed subjects, including the same two authors, were tested in the vehicle/animal categorization experiments, as well as the color disk control experiment. Five right-handed subjects, including the same two authors, were tested in the T/L discrimination control experiment. Ages ranged from 20 to 26 (average 24). One other subject was discarded because he could not maintain his attentional focus on the central letter discrimination task under the dual task condition.
Database.
The pictures were complex color scenes taken from a large commercially available CD-ROM library allowing access to several thousand stimuli (15). The animal category images included more than 800 pictures of mammals, birds, fish, insects, and reptiles. In a separate experiment (Fig. (Fig.33Figure 3 b and c), an additional target category was used—vehicles. The vehicle category images (more than 800) included pictures of cars, trucks, trains, airplanes, ships, and hot-air balloons. There was also a very wide range of more than 900 distractor images, which included natural landscapes, city scenes, food, fruits, plants, houses, and artificial objects.
Figure 3
Figure 3
Figure 3
(a) Summary of categorization of masked animal images. This panel corresponds to the normalized average performance of the main experiment (Fig. (Fig.2).2Figure 2). Each open circle is the average value of one subject's dual task performance, normalized (more ...)
Experimental Setup.
Subjects were seated in a dark room specially designed for psychophysics experiments. The seat was approximately 120 cm from a computer screen (1,024 × 1,286 pixels, 3 × 8 bit RGB), connected to a Silicon Graphics (Mountain View, CA) O2 computer. The refresh rate of the monitor was 75 Hz. We used a photocell and oscilloscope to ensure that the experimental setup achieved the desired refreshing rate for our experiment. The display was synchronized with the vertical retrace of the monitor.
Training Procedure.
Each experiment required a significant training period. It usually took more than 10 h (approximately 12,000 trials of all different tasks combined) for a subject to coordinate their motor responses well enough to answer both a speeded peripheral task and the central task. The central SOA (stimulus onset asynchrony, the time between the appearance of the central stimulus and the onset of the central mask), starting at 500 ms, was decreased after each block where the performance of this task exceeded 85%. The procedure was terminated after the subject's performance had stabilized and the central SOA was below 250 ms. This value was chosen to limit the possibility of switching attention during stimulus presentation. All tasks received the same amount of training for each subject to avoid bias for any particular task.
Experimental Paradigm.
Each experiment (main experiment and two control experiments) consisted of three different conditions: an attentionally demanding central task (identical in all experiments), a peripheral task (in which the role of attention was investigated), and a dual task condition in which both the central and peripheral task were performed concurrently. In each experiment, all trials were organized in the same way irrespective of the experimental condition (i.e., single task or dual task).
Central Letter Discrimination Task.
Each trial started with a fixation cross 300 ± 100 ms before the onset of the first stimulus. At 0 ms, the central stimulus (a combination of five letters) was presented. The five letters (T's and L's, either all identical, or one differing from the other four), appeared at nine possible locations within 1.2° eccentricity. Each letter was randomly rotated. After the central SOA, each stimulus letter was masked by the letter F. For a given subject, SOA is the same for both single task and dual task condition. All trial types were presented with equal probability. Subjects were instructed to respond by pressing “S” on the keyboard if the five letters were the same, or “D” if one of the letters differed from the other four.
In a separate control experiment, we tested our subjects' central performance with shortened SOAs. Subjects were instructed to perform the central letter discrimination task. For each subject, the central SOA alternated in four blocks of 48 trials between two values: the subject's original SOA that was reached at the end of the training procedure, and another SOA 66 ms shorter.
Peripheral Task.
In each peripheral task, the stimulus was always presented 53 ms after the central stimulus onset and followed by a perceptual mask. Subjects responded to these tasks in a speeded fashion. They were instructed to continuously hold down the mouse button and release it as fast as possible, within 1,000 ms, when they detected a target.
Natural scene categorization.
The peripheral stimuli were natural images, half of them containing one or more target objects. Each image (of size 3.2° × 4.8°) was flashed for 27 ms at a random location centered at around 6.1° eccentricity. Novel images (1,056 in all) were used as test stimuli under dual task condition for 11 blocks of 96 trials. The peripheral stimulus was followed by a perceptual mask. Eight different masks were used. Each of them was a colored picture of a mixture of white noise at different spatial frequencies on which a naturalistic texture was superimposed. The peripheral SOA was adjusted in the same way as for the central SOA so that performance would stabilize under 85%. Individual peripheral SOAs ranged from 53 to 80 ms. For a given subject, SOA was the same for both single task and dual task condition.
Peripheral letter discrimination.
In this control experiment, the peripheral stimulus (of size 1.5° × 1.5°) was a randomly rotated letter T or L masked by the letter F. The target was the letter L. The peripheral SOA was determined individually as previously, ranging from 53 to 160 ms. For a given subject, SOA is the same for both single task and dual task condition.
Peripheral color pattern discrimination.
In this control experiment, the peripheral stimulus (of size 1.5° × 1.5°) was a vertically bisected disk with red and green halves. The target was the disk in which red was on the right. The mask was a disk divided into four quadrants, with red and green alternating between each quadrant. The colors were matched for gray levels. The peripheral SOA was determined individually as previously, ranging from 66 to 106 ms. For a given subject, SOA was the same for both single task and dual task condition.
Dual Task.
In all dual tasks, subjects were instructed to focus attention on the central task. On each trial, they were supposed to respond to the peripheral stimulus as fast as possible (with their right hand) before the central stimulus (with their left hand).
We studied the role of attention in natural scene categorization by using a dual task paradigm, in which a natural scene categorization task, where target scenes were defined by the presence of one or more animals, was performed concurrently with another visual task that required visual attention (refs. 1, 20, and 21; Fig. Fig.1).1Figure 1). The idea is to compare subjects' performance of the categorization task under two conditions: the single task condition where attention is available, and the dual task condition where attention is drawn away by the other task. If the rapid natural scene categorization task demands attention, we should observe a significant decrease in performance under the dual task condition. If the rapid natural scene categorization does not entail much attentional cost, performances should be comparable.
Figure 1
Figure 1
Figure 1
Experimental protocol. (a) Schematic illustration of one trial. After a fixation cross presented at the center of the visual field, an attentionally demanding letter discrimination task is presented centrally. The central stimulus (combination of five (more ...)
Our attentionally demanding task involves discriminating displays composed of five randomly rotated T's and L's at the center of the visual field. Subjects needed to respond by pressing one key when all five letters were the same and another key when one of the letters differed from the other four. This task engages attention at the center of the display, preventing attention from focusing on the natural scene in the periphery (refs. 1 and 20; see also Fig. Fig.33Figure 3 d and e). When our subjects performed this task alone, their performances averaged around 77% (varied between 68% and 82%; Fig. Fig.2).2Figure 2). This value can be used as a reference for the dual task condition: if a subject has continuously engaged full attention to the central task, we expect the performance to be maintained at the same level; any significant distraction or withdrawing of attention would decrease performance.
Figure 2
Figure 2
Figure 2
Main results. Individual subject's results for dual vs. single task performance (five subjects). The horizontal axis represents performance of the central task (attentionally demanding letter discrimination task). The vertical axis represents performance (more ...)
The natural scene categorization was a modification of the one used by Thorpe and colleagues (15). A picture was flashed for only 27 ms at a random location in the periphery of the visual field, followed by a perceptual mask (Fig. (Fig.1).1Figure 1). Subjects had to decide whether the image contained an animal (or animals) or not as fast and accurately as possible (15). When subjects performed this task alone, their performance averaged around 76% (ranging from 75 to 79%; Fig. Fig.22Figure 2).
Under the dual task condition, subjects were instructed to focus attention at the center of the display, and to try to perform both tasks as accurately as possible. Because we were interested in the reaction times of the natural scene categorization task, we asked subjects to respond as fast as possible to the peripheral task before answering the central task. For each subject, the central task performance under the dual task condition showed no difference (P > 0.05) from its counterpart under the single task condition (Fig. (Fig.2).2Figure 2). This is a clear indication that attention was locked at the center under the dual task condition. Furthermore, for each individual subject the average peripheral categorization performance under the dual task condition was not significantly (t test, P > 0.05) different from the corresponding performance under the single task condition (Fig. (Fig.2),2Figure 2), suggesting that natural scene categorization can still be performed when attention is drawn away (see also Fig. Fig.33Figure 3 ac).
One might argue that subjects could first attend to the peripheral stimulus before switching attention to the central one. In that case, however, the time available to process the central stimulus would be much shorter by at least 80 ms than the actual central SOA (the peripheral stimulus is turned off 80 ms after the onset of the central stimulus). This strategy would result in a strong decrease in performance of the central task. Indeed, in a separate control experiment, we asked all six subjects to perform the central letter task with an SOA shortened by only 66 ms. Their average performance dropped from 77% to 66% (individual t test for each subject, P = 0.01). This confirms that our results do not reflect a systematic switch of attention between the two tasks.
Because of its high motor coordination demands, the dual task required extensive training. During this period, our subjects were repetitively trained with the same set of 288 images. It could be argued that such training could serve to optimize feature detection mechanisms for specific stimuli, reducing the attentional demands for this task (22, 23). However, the above results were obtained with a set of 1,056 novel images that were never presented during training. Furthermore, we show later (Fig. (Fig.33Figure 3 d and e) that the same amount of training in other dual tasks did not reduce attentional demands. This makes it unlikely that our results are a direct consequence of the training process. In addition to our experiments, a study done by Rousselet et al. reaches a compatible conclusion with untrained subjects (24).
Reaction times measured under the single task condition are compatible with results observed by Thorpe and colleagues (15), suggesting that our natural scene categorization task is performed in an ultra-rapid mode. Note that this task involves a speeded response under both single and dual task conditions. Under the dual task condition, while categorization performance is unaffected, we observe an average delay of 117 ms in response times compared with the single task condition (single task, 491 ms; dual task, 608 ms). This delay is likely to arise because of central rather than perceptual attentional competition (25). Indeed, when subjects are required to perform two tasks simultaneously, interference is known to occur at several different stages: task preparation (26), response selection (27, 28), and response production (29, 30). These limitations, often referred to as the “psychological refractory period” (31, 32), could easily account for the observed delay (25). Moreover, a number of studies have shown that the presence of attention decreases perceptual latencies (33) and reaction times to a significant extent (3436). This could also explain the observed delay.
Are the above results due to the high biological and evolutionary relevance of the target category “animal”? In other words, could we obtain a similar result using a man-made object category (e.g., vehicles) (19)? We tested one group of five subjects with both categorization tasks. In the vehicle task, target images included cars, trains, airplanes, ships, etc. Half of the distractors were animal scenes, whereas the other half contained neither animals nor vehicles (Fig. (Fig.33Figure 3c). The animal task was essentially the same as in the main experiment (Figs. (Figs.22Figure 2 and and33Figure 3a), with the exception that 50% of the distractor images contained vehicles (Fig. (Fig.33Figure 3b). The two tasks were presented in alternation and all stimuli were masked. Our results show that for each individual subject there is no significant decrease in categorization performance under the dual task condition compared with the single task condition in both cases (Fig. (Fig.33Figure 3 b and c, t test, P > 0.05). This result suggests that categorization of natural scenes in the near absence of attention might well be a general phenomenon not limited to evolutionarily relevant object categories. Another possible confound is that the subjects may not be performing an animal (or vehicle) detection task, but rather may be detecting the presence of a “foreground object.” Foreground objects may be more frequent in image containing animals or vehicles than in images containing scenery only. However, the fact that animal photographs were used as distractors for the vehicle task and vice versa makes this possibility implausible because foreground objects were contained both in the target and distractor images.
The interpretation of our findings relies on the assumption that attention is allocated to the center of the visual field under the dual task condition. This assumption is supported by the fact that there is no decrease in the central performance under dual task compared with single task conditions. This implies that when the peripheral task does demand attention, performance should suffer. To examine this question, we conducted two control experiments in which the peripheral tasks involved either discriminating a briefly presented letter followed by a mask (T or L followed by F; Fig. Fig.33Figure 3d) or discriminating a briefly presented and masked color disk (red/green or green/red; Fig. Fig.33Figure 3e). These tasks have been shown by Braun and colleagues (20) to require attention. In both of these control experiments, the central task was the same as in our previous experiments (five T's and L's discrimination). We observed a sharp drop in performance of both peripheral tasks (P < 0.0001 in Fig. Fig.33Figure 3 d and e). Although subjects can perform at 74 and 78% in peripheral single letter and color tasks, respectively, they cannot do any better than chance (individual paired t test for each subject, P > 0.05; average over all subjects is 51% for letter task and 51% for color task) during the dual task scenarios. These results demonstrate that attention is effectively allocated to the central task and provide further evidence that extensive training does not necessarily result in an improvement of performances. Subjects performing these dual tasks received the same amount of training as those performing the natural categorization tasks.
Our findings show that rapid visual categorization of novel natural scenes requires very little or no focal attention. Perception outside the focus of attention has mostly been reported for simple salient stimuli (1, 2). In our task, however, human subjects are actively searching for a complex category of objects whose appearance is highly variable. It thus appears that a sophisticated high level of representation (e.g., semantic) can be accessed outside the focus of attention. It has already been argued that the “gist” of a visual scene could be available preattentively (37, 38). In this context, the contents of the “gist” could in fact be extended to include information about the presence of a complex target category whose appearance is not known in advance.
These results suggest that if attention gates visual information processing at early stages of the visual system, such as V1 and V2 (2, 3941), it cannot do so in an “all-or-nothing” fashion. At least some information from unattended parts of the visual field can reach higher-level areas of the infero-temporal cortex and medial temporal lobe, where selective neuronal responses to various categories of objects have been found (4245).
The ability to rapidly categorize highly variable natural scenes outside the focus of attention might constitute an evolutionary advantage (46, 47). This type of preattentive behavior can be contrasted with a more flexible but time-consuming mode of processing, in which focal attention might be necessary for granting access to visual awareness. It is commonly believed that only elementary scene properties such as orientation, motion and brightness gradients (i.e., properties that must have direct physiological correlates in the mechanisms of the early visual system) may be detected while attention is engaged elsewhere. Our findings challenge this classical view.
Acknowledgments
We thank J. Braun, F. Crick, L. Chelazzi, G. Kreiman, and P. Wilken for critical comments on an earlier version of the manuscript. This research was supported by grants from the National Science Foundation-sponsored Engineering Research Center at Caltech, the National Institutes of Health, the Keck Foundation, and the McDonnell Foundation. F.F.L. is supported by the Paul and Daisy Soros Fellowship for New Americans and a National Science Foundation Graduate Fellowship. R.V.R. is supported by a Caltech Fellowship.
Abbreviation
SOAstimulus onset asynchrony

1. Braun J, Julesz B. Percept Psychophys. 1998;60:1–23. [PubMed]
2. Treisman A, Gelade G. Cognit Psychol. 1980;12:97–136. [PubMed]
3. Braun J. J Neurosci. 1994;14:554–567. [PubMed]
4. Bergen J R, Julesz B. Nature (London). 1983;303:696–698. [PubMed]
5. Malik J, Perona P J. Opt Soc Am A. 1990;7:923–932.
6. Logothetis N K, Sheinberg D L. Annu Rev Neurosci. 1996;19:577–621. [PubMed]
7. Keysers C, Xiao D K, Foldiak P, Perrett D I. J Cognit Neurosci. 2001;13:90–101. [PubMed]
8. Freedman D J, Riesenhuber M, Poggio T, Miller E K. Science. 2001;291:312–316. [PubMed]
9. Rensink R A, O'Regan J K, Clark J J. Psychol Sci. 1997;8:368–373.
10. O'Regan J K, Rensink R A, Clark J J. Nature (London). 1999;398:34. [PubMed]
11. Simons D J, Levin D T. Trends Cognit Sci. 1997;1:261–267.
12. Mack A, Rock I. Inattentional Blindness. Cambridge, MA: MIT Press; 1998.
13. Potter M C, Levy E I J. Exp Psychol. 1969;81:10–15.
14. Subramaniam S, Biederman I, Madigan S. Vis Cognit. 2000;7:511–535.
15. Thorpe S, Fize D, Marlot C. Nature (London). 1996;381:520–522. [PubMed]
16. Fabre-Thorpe M, Delorme A, Marlot C, Thorpe S. J Cognit Neurosci. 2001;13:171–180. [PubMed]
17. Delorme A, Richard G, Fabre-Thorpe M. Vision Res. 2000;40:2187–2200. [PubMed]
18. VanRullen R, Thorpe S J. J Cognit Neurosci. 2001;13:454–461. [PubMed]
19. Thorpe S, Gegenfurtner K R, Fabre-Thorpe M, Bulthoff H H. Eur J Neurosci. 2001;14:869–876. [PubMed]
20. Lee D K, Koch C, Braun J. Percept Psychophys. 1999;61:1241–1255. [PubMed]
21. Sperling G, Dosher B. In: Handbook of Perception and Human Performance. Boff K R, Kaufman L, Thomas J P, editors. New York: Wiley; 1986. pp. 1–65.
22. Braun J. Nature (London). 1998;393:424–425. [PubMed]
23. Joseph J S, Chun M M, Nakayama K. Nature (London). 1998;393:424–425. [PubMed]
24. Rousselet, G., Fabre-Thorpe, M. & Thorpe, S. (2002) Nat. Neurosci., in press.
25. Pashler H. The Psychology of Attention. Cambridge, MA: MIT Press; 1998.
26. Gottsdanker R. In: Tutorials in Motor Behavior. Stelmach G E, Requin J, editors. Amsterdam: North–Holland; 1980. pp. 355–371.
27. Welford A T. Br J Psychol. 1952;43:2–19.
28. Pashler H. J Exp Psychol Human. 1984;10:358–377.
29. Heuer H. J Motor Behav. 1985;17:335–354. [PubMed]
30. Netick A, Klapp S T. J Exp Psychol Human. 1994;20:766–782.
31. Telford C W. J Exp Psychol. 1931;14:1–36.
32. Vince M. Br J Psychol. 1949;40:23–40. [PubMed]
33. Hikosaka O, Miyauchi S, Shimojo S. Vision Res. 1992;33:1219–1240. [PubMed]
34. Posner M I, Snyder C R R, Davidson B J. J Exp Psychol General. 1980;109:160–174.
35. Kingstone A. Q J Exp Psychol. 1992;44:69–104.
36. Proverbio A M, Mangun G R. Int J Neurosci. 1994;79:221–233. [PubMed]
37. Biederman I. Science. 1972;177:77–80. [PubMed]
38. Wolfe J M. Curr Biol. 1998;8:R303–R304. [PubMed]
39. Heinze H J, Mangun G R, Burchert W, Hinrichs H, Scholz M, Munte T F, Gos A, Scherg M, Johannes S, Hundeshagen H, et al. Nature (London). 1994;372:543–546. [PubMed]
40. Luck S J, Chelazzi L, Hillyard S A, Desimone R. J Neurophysiol. 1997;77:24–42. [PubMed]
41. Allison T, Puce A, Spencer D D, McCarthy G. Cereb Cortex. 1999;9:415–430. [PubMed]
42. Aguirre G K, Zarahn E, D'Esposito M. Neuron. 1998;21:373–383. [PubMed]
43. Epstein R, Kanwisher N. Nature (London). 1998;392:598–601. [PubMed]
44. Chao L L, Martin A, Haxby J V. Nat Neurosci. 1999;2:913–919. [PubMed]
45. Kreiman G, Koch C, Fried I. Nat Neurosci. 2000;3:946–953. [PubMed]
46. Olshausen B A, Field D J. Network. 1996;7:333–339.
47. Vinje W E, Gallant J L. Science. 2000;287:1273–1276. [PubMed]

See more articles cited in this paragraph
See more articles cited in this paragraph
See more articles cited in this paragraph
See more articles cited in this paragraph
See more articles cited in this paragraph
See more articles cited in this paragraph
See more articles cited in this paragraph
See more articles cited in this paragraph
See more articles cited in this paragraph
See more articles cited in this paragraph
See more articles cited in this paragraph
See more articles cited in this paragraph
See more articles cited in this paragraph
See more articles cited in this paragraph