What if our brains learned from rewards not just by averaging them but by considering their full range of possibilities? A groundbreaking study published in Nature explored how the brain’s dopamine system encodes not only the expected value of rewards but also their variability, refining our understanding of learning and decision-making.
Using advanced neural recording techniques and trained mice, a team of Harvard University researchers identified a specialized circuit in the striatum that tracks reward variance, a key feature of distributional reinforcement learning.
These findings could reshape how we think about learning, risk-taking, and neurological conditions linked to dopamine function, such as addiction and Parkinson’s disease.
Image Credit: Triff/Shutterstock.com
Rewards and Decision-Making
The brain’s dopamine system plays a critical role in learning by signaling reward prediction errors, which are differences between expected and received rewards.
Traditional reinforcement learning models assume that the brain estimates only the mean value of expected rewards. However, recent machine learning advances show that incorporating the full probability distribution of rewards significantly improves learning efficiency.
Emerging evidence from neuroscience studies also suggests that the brain might similarly track not only average rewards but also their variability. This has major implications for understanding decision-making under uncertainty.
The striatum, the region of the brain that is central to motivation and learning, contains two main types of neurons — D1 and D2 medium spiny neurons — that respond differently to dopamine. However, whether and how these neurons encode reward variance remains unclear.
Reward Variance Assessments
To investigate how the brain assesses variations in rewards, the researchers developed a novel computational model and tested it using high-density neural recordings, induced dopamine lesions, and optogenetic methods.
The researchers conducted experiments using a classical conditioning task to investigate whether the striatum encodes reward variance. The study used mice that were trained to associate different odor cues with specific reward distributions, such as fixed rewards, variable rewards, or no reward.
The striatal activity in the mouse brains was recorded using neuropixel probes, while two-photon calcium imaging captured responses of specific neuron types.
The study focused on the D1 and D2 medium spiny neurons (MSNs), with the assumption that each played distinct roles in reward processing. Furthermore, optogenetic techniques allowed the researchers to selectively activate or inhibit these neurons and test their contributions to learning.
Additionally, dopamine lesions were induced in the ventral striatum using neurotoxin 6-hydroxydopamine to assess the necessity of dopamine in encoding reward variance. A computational model called reflected expectile distributional reinforcement learning (REDRL) was developed to interpret these findings.
By combining multiple methods, such as neural recordings, computational modeling, and causal manipulations, the researchers aimed to comprehensively determine how the striatum encodes both the mean and variance of rewards, which is a key feature of distributional reinforcement learning.
Major Findings
The study found that neurons in the striatum encode not only the expected value of rewards but also their variability. This finding contradicted traditional reinforcement learning models that assume that the brain tracks only mean reward values.
Distinct populations of striatal neurons were found to represent different parts of the reward distribution, with D1 neurons encoding the upper range consisting of optimistic predictions, while the D2 neurons tracking the lower range, which comprised pessimistic predictions.
Furthermore, the results showed that dopamine lesions disrupted the encoding of reward variance but left mean value tracking intact, highlighting the role of dopamine in distributional learning.
Additionally, manipulation of neurons using optogenetic methods confirmed that D1 and D2 neurons contribute oppositely to reward representation, supporting the REDRL model.
Moreover, the REDRL model could accurately predict neural responses and behavioral changes, which reinforced the idea that the brain naturally integrates distributional reinforcement learning principles.
These findings have major implications for neuroscience and artificial intelligence. Understanding how the brain processes reward distributions could enhance treatments for disorders linked to dopamine dysfunction, such as addiction and Parkinson’s disease.
Additionally, insights from this study could refine machine learning algorithms that mimic biological learning.
Conclusions
In summary, the findings from this study challenged traditional reinforcement learning theories by demonstrating that the brain encodes both mean and variance assessments of rewards.
The discovery of a specialized striatal circuit for distributional learning furthers and refines our understanding of decision-making, learning, and dopamine function.
These findings could impact treatments for dopamine-related disorders and inspire new approaches in artificial intelligence, bridging the gap between biological and computational models of learning.
Source:
Journal reference: