Format

Send to

Choose Destination
Nature. 2020 Jan;577(7792):671-675. doi: 10.1038/s41586-019-1924-6. Epub 2020 Jan 15.

A distributional code for value in dopamine-based reinforcement learning.

Author information

1
DeepMind, London, UK. wdabney@google.com.
2
DeepMind, London, UK.
3
Max Planck UCL Centre for Computational Psychiatry and Ageing Research, University College London, London, UK.
4
Center for Brain Science, Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, USA.
5
Gatsby Computational Neuroscience Unit, University College London, London, UK.

Abstract

Since its introduction, the reward prediction error theory of dopamine has explained a wealth of empirical phenomena, providing a unifying framework for understanding the representation of reward and value in the brain1-3. According to the now canonical theory, reward predictions are represented as a single scalar quantity, which supports learning about the expectation, or mean, of stochastic outcomes. Here we propose an account of dopamine-based reinforcement learning inspired by recent artificial intelligence research on distributional reinforcement learning4-6. We hypothesized that the brain represents possible future rewards not as a single mean, but instead as a probability distribution, effectively representing multiple future outcomes simultaneously and in parallel. This idea implies a set of empirical predictions, which we tested using single-unit recordings from mouse ventral tegmental area. Our findings provide strong evidence for a neural realization of distributional reinforcement learning.

PMID:
31942076
DOI:
10.1038/s41586-019-1924-6

Supplemental Content

Full text links

Icon for Nature Publishing Group
Loading ...
Support Center