![]() | ![]() |
Formats:
|
||||||||||
Copyright © 2002, The National Academy of Sciences Psychology Rapid natural scene categorization in the near absence of attention Divisions of *Engineering and Applied Sciences and ‡Biology, California Institute of Technology, Pasadena, CA 91125 †To whom reprint requests should be addressed. E-mail: feifeili/at/vision.caltech.edu. Communicated by David Mumford, Brown University, Providence, RI Received March 22, 2002; Accepted May 8, 2002. This article has been cited by other articles in PMC.Abstract What can we see when we do not pay attention? It is well known that we can be “blind” even to major aspects of natural scenes when we attend elsewhere. The only tasks that do not need attention appear to be carried out in the early stages of the visual system. Contrary to this common belief, we report that subjects can rapidly detect animals or vehicles in briefly presented novel natural scenes while simultaneously performing another attentionally demanding task. By comparison, they are unable to discriminate large T's from L's, or bisected two-color disks from their mirror images under the same conditions. We conclude that some visual tasks associated with “high-level” cortical areas may proceed in the near absence of attention. Psychologists have long known that certain visual search tasks require minimal or no attention. A hallmark of preattentive vision is that it is achieved in a seemingly parallel fashion: a preattentive task may be carried out simultaneously with other visual tasks (1); target detection does not become significantly more difficult when the number of distractors is increased (2, 3). However, none of the known preattentive tasks approaches the sophistication of everyday vision, where complex scenes must be scrutinized to assess high-level properties such as the presence of danger and the structure of a social interaction. Virtually all of the visual tasks that may be performed preattentively have been explained, either in detail or in principle, by quasilinear models that replicate mechanisms found in the early stages of our visual system (4, 5). Although much can be accomplished by these simple mechanisms, it is quite clear that they are inadequate for explaining “high-level” perception such as recognition and categorization—i.e., visual processes that rely on neural activities in inferior temporal cortex and beyond (6–8). This would suggest that there is no sophisticated property of the scene that we can see without paying attention. In agreement with this view, change blindness and inattentional blindness studies demonstrate that without visual attention, significant changes in a large part of the visual field can easily escape our awareness (9–12). On the other hand, some complex visual tasks can be rapidly accomplished by our visual system. RSVP (rapid serial visual presentation) experiments have demonstrated that natural objects belonging to a specified category may be classified remarkably fast (13, 14). Thorpe and colleagues (15–19) have found that complex natural scenes can be categorized in as little as 150 ms. This astonishing speed relative to the time constant of information processing and transmission in networks of neurons raises the question of whether attention plays a critical role in this type of rapid visual processing. Our results indicate that there is little or no attentional cost in rapid visual categorization of complex, natural images. Materials and Methods Subjects. Five right-handed subjects, including two authors (F.F.L. and R.V.R.), were tested in the main experiment. Four right-handed subjects, including the same two authors, were tested in the vehicle/animal categorization experiments, as well as the color disk control experiment. Five right-handed subjects, including the same two authors, were tested in the T/L discrimination control experiment. Ages ranged from 20 to 26 (average 24). One other subject was discarded because he could not maintain his attentional focus on the central letter discrimination task under the dual task condition. Database. The pictures were complex color scenes taken from a large commercially available CD-ROM library allowing access to several thousand stimuli (15). The animal category images included more than 800 pictures of mammals, birds, fish, insects, and reptiles. In a separate experiment (Fig. (Fig.33
Experimental Setup. Subjects were seated in a dark room specially designed for psychophysics experiments. The seat was approximately 120 cm from a computer screen (1,024 × 1,286 pixels, 3 × 8 bit RGB), connected to a Silicon Graphics (Mountain View, CA) O2 computer. The refresh rate of the monitor was 75 Hz. We used a photocell and oscilloscope to ensure that the experimental setup achieved the desired refreshing rate for our experiment. The display was synchronized with the vertical retrace of the monitor. Training Procedure. Each experiment required a significant training period. It usually took more than 10 h (approximately 12,000 trials of all different tasks combined) for a subject to coordinate their motor responses well enough to answer both a speeded peripheral task and the central task. The central SOA (stimulus onset asynchrony, the time between the appearance of the central stimulus and the onset of the central mask), starting at 500 ms, was decreased after each block where the performance of this task exceeded 85%. The procedure was terminated after the subject's performance had stabilized and the central SOA was below 250 ms. This value was chosen to limit the possibility of switching attention during stimulus presentation. All tasks received the same amount of training for each subject to avoid bias for any particular task. Experimental Paradigm. Each experiment (main experiment and two control experiments) consisted of three different conditions: an attentionally demanding central task (identical in all experiments), a peripheral task (in which the role of attention was investigated), and a dual task condition in which both the central and peripheral task were performed concurrently. In each experiment, all trials were organized in the same way irrespective of the experimental condition (i.e., single task or dual task). Central Letter Discrimination Task. Each trial started with a fixation cross 300 ± 100 ms before the onset of the first stimulus. At 0 ms, the central stimulus (a combination of five letters) was presented. The five letters (T's and L's, either all identical, or one differing from the other four), appeared at nine possible locations within 1.2° eccentricity. Each letter was randomly rotated. After the central SOA, each stimulus letter was masked by the letter F. For a given subject, SOA is the same for both single task and dual task condition. All trial types were presented with equal probability. Subjects were instructed to respond by pressing “S” on the keyboard if the five letters were the same, or “D” if one of the letters differed from the other four. In a separate control experiment, we tested our subjects' central performance with shortened SOAs. Subjects were instructed to perform the central letter discrimination task. For each subject, the central SOA alternated in four blocks of 48 trials between two values: the subject's original SOA that was reached at the end of the training procedure, and another SOA 66 ms shorter. Peripheral Task. In each peripheral task, the stimulus was always presented 53 ms after the central stimulus onset and followed by a perceptual mask. Subjects responded to these tasks in a speeded fashion. They were instructed to continuously hold down the mouse button and release it as fast as possible, within 1,000 ms, when they detected a target. Natural scene categorization. The peripheral stimuli were natural images, half of them containing one or more target objects. Each image (of size 3.2° × 4.8°) was flashed for 27 ms at a random location centered at around 6.1° eccentricity. Novel images (1,056 in all) were used as test stimuli under dual task condition for 11 blocks of 96 trials. The peripheral stimulus was followed by a perceptual mask. Eight different masks were used. Each of them was a colored picture of a mixture of white noise at different spatial frequencies on which a naturalistic texture was superimposed. The peripheral SOA was adjusted in the same way as for the central SOA so that performance would stabilize under 85%. Individual peripheral SOAs ranged from 53 to 80 ms. For a given subject, SOA was the same for both single task and dual task condition. Peripheral letter discrimination. In this control experiment, the peripheral stimulus (of size 1.5° × 1.5°) was a randomly rotated letter T or L masked by the letter F. The target was the letter L. The peripheral SOA was determined individually as previously, ranging from 53 to 160 ms. For a given subject, SOA is the same for both single task and dual task condition. Peripheral color pattern discrimination. In this control experiment, the peripheral stimulus (of size 1.5° × 1.5°) was a vertically bisected disk with red and green halves. The target was the disk in which red was on the right. The mask was a disk divided into four quadrants, with red and green alternating between each quadrant. The colors were matched for gray levels. The peripheral SOA was determined individually as previously, ranging from 66 to 106 ms. For a given subject, SOA was the same for both single task and dual task condition. Dual Task. In all dual tasks, subjects were instructed to focus attention on the central task. On each trial, they were supposed to respond to the peripheral stimulus as fast as possible (with their right hand) before the central stimulus (with their left hand). Results We studied the role of attention in natural scene categorization by using a dual task paradigm, in which a natural scene categorization task, where target scenes were defined by the presence of one or more animals, was performed concurrently with another visual task that required visual attention (refs. 1, 20, and 21; Fig. Fig.1).1
Our attentionally demanding task involves discriminating displays composed of five randomly rotated T's and L's at the center of the visual field. Subjects needed to respond by pressing one key when all five letters were the same and another key when one of the letters differed from the other four. This task engages attention at the center of the display, preventing attention from focusing on the natural scene in the periphery (refs. 1 and 20; see also Fig. Fig.33
The natural scene categorization was a modification of the one used by Thorpe and colleagues (15). A picture was flashed for only 27 ms at a random location in the periphery of the visual field, followed by a perceptual mask (Fig. (Fig.1).1 Under the dual task condition, subjects were instructed to focus attention at the center of the display, and to try to perform both tasks as accurately as possible. Because we were interested in the reaction times of the natural scene categorization task, we asked subjects to respond as fast as possible to the peripheral task before answering the central task. For each subject, the central task performance under the dual task condition showed no difference (P > 0.05) from its counterpart under the single task condition (Fig. (Fig.2).2 One might argue that subjects could first attend to the peripheral stimulus before switching attention to the central one. In that case, however, the time available to process the central stimulus would be much shorter by at least 80 ms than the actual central SOA (the peripheral stimulus is turned off 80 ms after the onset of the central stimulus). This strategy would result in a strong decrease in performance of the central task. Indeed, in a separate control experiment, we asked all six subjects to perform the central letter task with an SOA shortened by only 66 ms. Their average performance dropped from 77% to 66% (individual t test for each subject, P = 0.01). This confirms that our results do not reflect a systematic switch of attention between the two tasks. Because of its high motor coordination demands, the dual task required extensive training. During this period, our subjects were repetitively trained with the same set of 288 images. It could be argued that such training could serve to optimize feature detection mechanisms for specific stimuli, reducing the attentional demands for this task (22, 23). However, the above results were obtained with a set of 1,056 novel images that were never presented during training. Furthermore, we show later (Fig. (Fig.33 Reaction times measured under the single task condition are compatible with results observed by Thorpe and colleagues (15), suggesting that our natural scene categorization task is performed in an ultra-rapid mode. Note that this task involves a speeded response under both single and dual task conditions. Under the dual task condition, while categorization performance is unaffected, we observe an average delay of 117 ms in response times compared with the single task condition (single task, 491 ms; dual task, 608 ms). This delay is likely to arise because of central rather than perceptual attentional competition (25). Indeed, when subjects are required to perform two tasks simultaneously, interference is known to occur at several different stages: task preparation (26), response selection (27, 28), and response production (29, 30). These limitations, often referred to as the “psychological refractory period” (31, 32), could easily account for the observed delay (25). Moreover, a number of studies have shown that the presence of attention decreases perceptual latencies (33) and reaction times to a significant extent (34–36). This could also explain the observed delay. Are the above results due to the high biological and evolutionary relevance of the target category “animal”? In other words, could we obtain a similar result using a man-made object category (e.g., vehicles) (19)? We tested one group of five subjects with both categorization tasks. In the vehicle task, target images included cars, trains, airplanes, ships, etc. Half of the distractors were animal scenes, whereas the other half contained neither animals nor vehicles (Fig. (Fig.33 The interpretation of our findings relies on the assumption that attention is allocated to the center of the visual field under the dual task condition. This assumption is supported by the fact that there is no decrease in the central performance under dual task compared with single task conditions. This implies that when the peripheral task does demand attention, performance should suffer. To examine this question, we conducted two control experiments in which the peripheral tasks involved either discriminating a briefly presented letter followed by a mask (T or L followed by F; Fig. Fig.33 Discussion Our findings show that rapid visual categorization of novel natural scenes requires very little or no focal attention. Perception outside the focus of attention has mostly been reported for simple salient stimuli (1, 2). In our task, however, human subjects are actively searching for a complex category of objects whose appearance is highly variable. It thus appears that a sophisticated high level of representation (e.g., semantic) can be accessed outside the focus of attention. It has already been argued that the “gist” of a visual scene could be available preattentively (37, 38). In this context, the contents of the “gist” could in fact be extended to include information about the presence of a complex target category whose appearance is not known in advance. These results suggest that if attention gates visual information processing at early stages of the visual system, such as V1 and V2 (2, 39–41), it cannot do so in an “all-or-nothing” fashion. At least some information from unattended parts of the visual field can reach higher-level areas of the infero-temporal cortex and medial temporal lobe, where selective neuronal responses to various categories of objects have been found (42–45). The ability to rapidly categorize highly variable natural scenes outside the focus of attention might constitute an evolutionary advantage (46, 47). This type of preattentive behavior can be contrasted with a more flexible but time-consuming mode of processing, in which focal attention might be necessary for granting access to visual awareness. It is commonly believed that only elementary scene properties such as orientation, motion and brightness gradients (i.e., properties that must have direct physiological correlates in the mechanisms of the early visual system) may be detected while attention is engaged elsewhere. Our findings challenge this classical view. Acknowledgments We thank J. Braun, F. Crick, L. Chelazzi, G. Kreiman, and P. Wilken for critical comments on an earlier version of the manuscript. This research was supported by grants from the National Science Foundation-sponsored Engineering Research Center at Caltech, the National Institutes of Health, the Keck Foundation, and the McDonnell Foundation. F.F.L. is supported by the Paul and Daisy Soros Fellowship for New Americans and a National Science Foundation Graduate Fellowship. R.V.R. is supported by a Caltech Fellowship. References 1. Braun J, Julesz B. Percept Psychophys. 1998;60:1–23. [PubMed] 2. Treisman A, Gelade G. Cognit Psychol. 1980;12:97–136. [PubMed] 3. Braun J. J Neurosci. 1994;14:554–567. [PubMed] 4. Bergen J R, Julesz B. Nature (London). 1983;303:696–698. [PubMed] 5. Malik J, Perona P J. Opt Soc Am A. 1990;7:923–932. 6. Logothetis N K, Sheinberg D L. Annu Rev Neurosci. 1996;19:577–621. [PubMed] 7. Keysers C, Xiao D K, Foldiak P, Perrett D I. J Cognit Neurosci. 2001;13:90–101. [PubMed] 8. Freedman D J, Riesenhuber M, Poggio T, Miller E K. Science. 2001;291:312–316. [PubMed] 9. Rensink R A, O'Regan J K, Clark J J. Psychol Sci. 1997;8:368–373. 10. O'Regan J K, Rensink R A, Clark J J. Nature (London). 1999;398:34. [PubMed] 11. Simons D J, Levin D T. Trends Cognit Sci. 1997;1:261–267. 12. Mack A, Rock I. Inattentional Blindness. Cambridge, MA: MIT Press; 1998. 13. Potter M C, Levy E I J. Exp Psychol. 1969;81:10–15. 14. Subramaniam S, Biederman I, Madigan S. Vis Cognit. 2000;7:511–535. 15. Thorpe S, Fize D, Marlot C. Nature (London). 1996;381:520–522. [PubMed] 16. Fabre-Thorpe M, Delorme A, Marlot C, Thorpe S. J Cognit Neurosci. 2001;13:171–180. [PubMed] 17. Delorme A, Richard G, Fabre-Thorpe M. Vision Res. 2000;40:2187–2200. [PubMed] 18. VanRullen R, Thorpe S J. J Cognit Neurosci. 2001;13:454–461. [PubMed] 19. Thorpe S, Gegenfurtner K R, Fabre-Thorpe M, Bulthoff H H. Eur J Neurosci. 2001;14:869–876. [PubMed] 20. Lee D K, Koch C, Braun J. Percept Psychophys. 1999;61:1241–1255. [PubMed] 21. Sperling G, Dosher B. In: Handbook of Perception and Human Performance. Boff K R, Kaufman L, Thomas J P, editors. New York: Wiley; 1986. pp. 1–65. 22. Braun J. Nature (London). 1998;393:424–425. [PubMed] 23. Joseph J S, Chun M M, Nakayama K. Nature (London). 1998;393:424–425. [PubMed] 24. Rousselet, G., Fabre-Thorpe, M. & Thorpe, S. (2002) Nat. Neurosci., in press. 25. Pashler H. The Psychology of Attention. Cambridge, MA: MIT Press; 1998. 26. Gottsdanker R. In: Tutorials in Motor Behavior. Stelmach G E, Requin J, editors. Amsterdam: North–Holland; 1980. pp. 355–371. 27. Welford A T. Br J Psychol. 1952;43:2–19. 28. Pashler H. J Exp Psychol Human. 1984;10:358–377. 29. Heuer H. J Motor Behav. 1985;17:335–354. [PubMed] 30. Netick A, Klapp S T. J Exp Psychol Human. 1994;20:766–782. 31. Telford C W. J Exp Psychol. 1931;14:1–36. 32. Vince M. Br J Psychol. 1949;40:23–40. [PubMed] 33. Hikosaka O, Miyauchi S, Shimojo S. Vision Res. 1992;33:1219–1240. [PubMed] 34. Posner M I, Snyder C R R, Davidson B J. J Exp Psychol General. 1980;109:160–174. 35. Kingstone A. Q J Exp Psychol. 1992;44:69–104. 36. Proverbio A M, Mangun G R. Int J Neurosci. 1994;79:221–233. [PubMed] 37. Biederman I. Science. 1972;177:77–80. [PubMed] 38. Wolfe J M. Curr Biol. 1998;8:R303–R304. [PubMed] 39. Heinze H J, Mangun G R, Burchert W, Hinrichs H, Scholz M, Munte T F, Gos A, Scherg M, Johannes S, Hundeshagen H, et al. Nature (London). 1994;372:543–546. [PubMed] 40. Luck S J, Chelazzi L, Hillyard S A, Desimone R. J Neurophysiol. 1997;77:24–42. [PubMed] 41. Allison T, Puce A, Spencer D D, McCarthy G. Cereb Cortex. 1999;9:415–430. [PubMed] 42. Aguirre G K, Zarahn E, D'Esposito M. Neuron. 1998;21:373–383. [PubMed] 43. Epstein R, Kanwisher N. Nature (London). 1998;392:598–601. [PubMed] 44. Chao L L, Martin A, Haxby J V. Nat Neurosci. 1999;2:913–919. [PubMed] 45. Kreiman G, Koch C, Fried I. Nat Neurosci. 2000;3:946–953. [PubMed] 46. Olshausen B A, Field D J. Network. 1996;7:333–339. 47. Vinje W E, Gallant J L. Science. 2000;287:1273–1276. [PubMed] |
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||
Percept Psychophys. 1998 Jan; 60(1):1-23.
[Percept Psychophys. 1998]Cogn Psychol. 1980 Jan; 12(1):97-136.
[Cogn Psychol. 1980]J Neurosci. 1994 Feb; 14(2):554-67.
[J Neurosci. 1994]Nature. 1983 Jun 23-29; 303(5919):696-8.
[Nature. 1983]Annu Rev Neurosci. 1996; 19():577-621.
[Annu Rev Neurosci. 1996]Nature. 1996 Jun 6; 381(6582):520-2.
[Nature. 1996]Eur J Neurosci. 2001 Sep; 14(5):869-76.
[Eur J Neurosci. 2001]Nature. 1996 Jun 6; 381(6582):520-2.
[Nature. 1996]Percept Psychophys. 1998 Jan; 60(1):1-23.
[Percept Psychophys. 1998]Percept Psychophys. 1999 Oct; 61(7):1241-55.
[Percept Psychophys. 1999]Percept Psychophys. 1998 Jan; 60(1):1-23.
[Percept Psychophys. 1998]Percept Psychophys. 1999 Oct; 61(7):1241-55.
[Percept Psychophys. 1999]Nature. 1996 Jun 6; 381(6582):520-2.
[Nature. 1996]Nature. 1998 Jun 4; 393(6684):424-5.
[Nature. 1998]Nature. 1998 Jun 4; 393(6684):424-5.
[Nature. 1998]Nature. 1996 Jun 6; 381(6582):520-2.
[Nature. 1996]J Mot Behav. 1985 Sep; 17(3):335-54.
[J Mot Behav. 1985]Br J Psychol. 1949 Sep; 40(1):23-40.
[Br J Psychol. 1949]Vision Res. 1993 Jun; 33(9):1219-40.
[Vision Res. 1993]Int J Neurosci. 1994 Dec; 79(3-4):221-33.
[Int J Neurosci. 1994]Eur J Neurosci. 2001 Sep; 14(5):869-76.
[Eur J Neurosci. 2001]Percept Psychophys. 1999 Oct; 61(7):1241-55.
[Percept Psychophys. 1999]Percept Psychophys. 1998 Jan; 60(1):1-23.
[Percept Psychophys. 1998]Cogn Psychol. 1980 Jan; 12(1):97-136.
[Cogn Psychol. 1980]Science. 1972 Jul 7; 177(43):77-80.
[Science. 1972]Curr Biol. 1998 Apr 23; 8(9):R303-4.
[Curr Biol. 1998]Cogn Psychol. 1980 Jan; 12(1):97-136.
[Cogn Psychol. 1980]Nature. 1994 Dec 8; 372(6506):543-6.
[Nature. 1994]Cereb Cortex. 1999 Jul-Aug; 9(5):415-30.
[Cereb Cortex. 1999]Neuron. 1998 Aug; 21(2):373-83.
[Neuron. 1998]Nat Neurosci. 2000 Sep; 3(9):946-53.
[Nat Neurosci. 2000]Science. 2000 Feb 18; 287(5456):1273-6.
[Science. 2000]Percept Psychophys. 1998 Jan; 60(1):1-23.
[Percept Psychophys. 1998]