Send to

Choose Destination
J Neurosci. 2017 Mar 15;37(11):3009-3017. doi: 10.1523/JNEUROSCI.3205-16.2017. Epub 2017 Feb 13.

Goal-Directed and Habit-Like Modulations of Stimulus Processing during Reinforcement Learning.

Author information

School of Psychology, UNSW Australia, Sydney, New South Wales 2052, Australia
School of Psychology, UNSW Australia, Sydney, New South Wales 2052, Australia.


Recent research has shown that perceptual processing of stimuli previously associated with high-value rewards is automatically prioritized even when rewards are no longer available. It has been hypothesized that such reward-related modulation of stimulus salience is conceptually similar to an "attentional habit." Recording event-related potentials in humans during a reinforcement learning task, we show strong evidence in favor of this hypothesis. Resistance to outcome devaluation (the defining feature of a habit) was shown by the stimulus-locked P1 component, reflecting activity in the extrastriate visual cortex. Analysis at longer latencies revealed a positive component (corresponding to the P3b, from 550-700 ms) sensitive to outcome devaluation. Therefore, distinct spatiotemporal patterns of brain activity were observed corresponding to habitual and goal-directed processes. These results demonstrate that reinforcement learning engages both attentional habits and goal-directed processes in parallel. Consequences for brain and computational models of reinforcement learning are discussed.SIGNIFICANCE STATEMENT The human attentional network adapts to detect stimuli that predict important rewards. A recent hypothesis suggests that the visual cortex automatically prioritizes reward-related stimuli, driven by cached representations of reward value; that is, stimulus-response habits. Alternatively, the neural system may track the current value of the predicted outcome. Our results demonstrate for the first time that visual cortex activity is increased for reward-related stimuli even when the rewarding event is temporarily devalued. In contrast, longer-latency brain activity was specifically sensitive to transient changes in reward value. Therefore, we show that both habit-like attention and goal-directed processes occur in the same learning episode at different latencies. This result has important consequences for computational models of reinforcement learning.


attention; event-related potentials; goal-directed; habit; learning; reward

[Indexed for MEDLINE]
Free full text

Supplemental Content

Full text links

Icon for HighWire
Loading ...
Support Center