Send to

Choose Destination
Nature. 2020 Jan;577(7792):671-675. doi: 10.1038/s41586-019-1924-6. Epub 2020 Jan 15.

A distributional code for value in dopamine-based reinforcement learning.

Author information

DeepMind, London, UK.
DeepMind, London, UK.
Max Planck UCL Centre for Computational Psychiatry and Ageing Research, University College London, London, UK.
Center for Brain Science, Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, USA.
Gatsby Computational Neuroscience Unit, University College London, London, UK.


Since its introduction, the reward prediction error theory of dopamine has explained a wealth of empirical phenomena, providing a unifying framework for understanding the representation of reward and value in the brain1-3. According to the now canonical theory, reward predictions are represented as a single scalar quantity, which supports learning about the expectation, or mean, of stochastic outcomes. Here we propose an account of dopamine-based reinforcement learning inspired by recent artificial intelligence research on distributional reinforcement learning4-6. We hypothesized that the brain represents possible future rewards not as a single mean, but instead as a probability distribution, effectively representing multiple future outcomes simultaneously and in parallel. This idea implies a set of empirical predictions, which we tested using single-unit recordings from mouse ventral tegmental area. Our findings provide strong evidence for a neural realization of distributional reinforcement learning.


Supplemental Content

Full text links

Icon for Nature Publishing Group
Loading ...
Support Center