How the Brain Uses Reinforcement Learning Beyond Just Mean Rewards

What if our brains learned from rewards not just by averaging them but by considering their full range of possibilities? A groundbreaking study published in Nature explored how the brain’s dopamine system encodes not only the expected value of rewards but also their variability, refining our understanding of learning and decision-making.

Using advanced neural recording techniques and trained mice, a team of Harvard University researchers identified a specialized circuit in the striatum that tracks reward variance, a key feature of distributional reinforcement learning.

These findings could reshape how we think about learning, risk-taking, and neurological conditions linked to dopamine function, such as addiction and Parkinson’s disease.

CT scan with brain and doctorImage Credit: Triff/Shutterstock.com

Rewards and Decision-Making

The brain’s dopamine system plays a critical role in learning by signaling reward prediction errors, which are differences between expected and received rewards.

Traditional reinforcement learning models assume that the brain estimates only the mean value of expected rewards. However, recent machine learning advances show that incorporating the full probability distribution of rewards significantly improves learning efficiency.

Emerging evidence from neuroscience studies also suggests that the brain might similarly track not only average rewards but also their variability. This has major implications for understanding decision-making under uncertainty.

The striatum, the region of the brain that is central to motivation and learning, contains two main types of neurons — D1 and D2 medium spiny neurons — that respond differently to dopamine. However, whether and how these neurons encode reward variance remains unclear.

Reward Variance Assessments

To investigate how the brain assesses variations in rewards, the researchers developed a novel computational model and tested it using high-density neural recordings, induced dopamine lesions, and optogenetic methods.

The researchers conducted experiments using a classical conditioning task to investigate whether the striatum encodes reward variance. The study used mice that were trained to associate different odor cues with specific reward distributions, such as fixed rewards, variable rewards, or no reward.

The striatal activity in the mouse brains was recorded using neuropixel probes, while two-photon calcium imaging captured responses of specific neuron types.

The study focused on the D1 and D2 medium spiny neurons (MSNs), with the assumption that each played distinct roles in reward processing. Furthermore, optogenetic techniques allowed the researchers to selectively activate or inhibit these neurons and test their contributions to learning.

Additionally, dopamine lesions were induced in the ventral striatum using neurotoxin 6-hydroxydopamine to assess the necessity of dopamine in encoding reward variance. A computational model called reflected expectile distributional reinforcement learning (REDRL) was developed to interpret these findings.

By combining multiple methods, such as neural recordings, computational modeling, and causal manipulations, the researchers aimed to comprehensively determine how the striatum encodes both the mean and variance of rewards, which is a key feature of distributional reinforcement learning.

Major Findings

The study found that neurons in the striatum encode not only the expected value of rewards but also their variability. This finding contradicted traditional reinforcement learning models that assume that the brain tracks only mean reward values.

Distinct populations of striatal neurons were found to represent different parts of the reward distribution, with D1 neurons encoding the upper range consisting of optimistic predictions, while the D2 neurons tracking the lower range, which comprised pessimistic predictions.

Furthermore, the results showed that dopamine lesions disrupted the encoding of reward variance but left mean value tracking intact, highlighting the role of dopamine in distributional learning.

Additionally, manipulation of neurons using optogenetic methods confirmed that D1 and D2 neurons contribute oppositely to reward representation, supporting the REDRL model.

Moreover, the REDRL model could accurately predict neural responses and behavioral changes, which reinforced the idea that the brain naturally integrates distributional reinforcement learning principles.

These findings have major implications for neuroscience and artificial intelligence. Understanding how the brain processes reward distributions could enhance treatments for disorders linked to dopamine dysfunction, such as addiction and Parkinson’s disease.

Additionally, insights from this study could refine machine learning algorithms that mimic biological learning.

Conclusions

In summary, the findings from this study challenged traditional reinforcement learning theories by demonstrating that the brain encodes both mean and variance assessments of rewards.

The discovery of a specialized striatal circuit for distributional learning furthers and refines our understanding of decision-making, learning, and dopamine function.

These findings could impact treatments for dopamine-related disorders and inspire new approaches in artificial intelligence, bridging the gap between biological and computational models of learning.

Source:
Journal reference:

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Sidharthan, Chinta. (2025, March 03). How the Brain Uses Reinforcement Learning Beyond Just Mean Rewards. AZoLifeSciences. Retrieved on March 03, 2025 from https://www.azolifesciences.com/news/20250303/How-the-Brain-Uses-Reinforcement-Learning-Beyond-Just-Mean-Rewards.aspx.

  • MLA

    Sidharthan, Chinta. "How the Brain Uses Reinforcement Learning Beyond Just Mean Rewards". AZoLifeSciences. 03 March 2025. <https://www.azolifesciences.com/news/20250303/How-the-Brain-Uses-Reinforcement-Learning-Beyond-Just-Mean-Rewards.aspx>.

  • Chicago

    Sidharthan, Chinta. "How the Brain Uses Reinforcement Learning Beyond Just Mean Rewards". AZoLifeSciences. https://www.azolifesciences.com/news/20250303/How-the-Brain-Uses-Reinforcement-Learning-Beyond-Just-Mean-Rewards.aspx. (accessed March 03, 2025).

  • Harvard

    Sidharthan, Chinta. 2025. How the Brain Uses Reinforcement Learning Beyond Just Mean Rewards. AZoLifeSciences, viewed 03 March 2025, https://www.azolifesciences.com/news/20250303/How-the-Brain-Uses-Reinforcement-Learning-Beyond-Just-Mean-Rewards.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoLifeSciences.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
New Research Identifies Brain Cells Linked to Alcohol-Induced Insomnia