JMLR

Optimizing Return Distributions with Distributional Dynamic Programming

Authors

Bernardo Ávila Pires Mark Rowland Diana Borsa Zhaohan Daniel Guo Khimya Khetarpal André Barreto David Abel Rémi Munos Will Dabney

View Full Paper

Paper Information

Journal:
Journal of Machine Learning Research
Added to Tracker:
Sep 08, 2025

Abstract

We introduce distributional dynamic programming (DP) methods for optimizing statistical functionals of the return distribution, with standard reinforcement learning as a special case. Previous distributional DP methods could optimize the same class of expected utilities as classic DP. To go beyond, we combine distributional DP with stock augmentation, a technique previously introduced for classic DP in the context of risk-sensitive RL, where the MDP state is augmented with a statistic of the rewards obtained since the first time step. We find that a number of recently studied problems can be formulated as stock-augmented return distribution optimization, and we show that we can use distributional DP to solve them. We analyze distributional value and policy iteration, with bounds and a study of what objectives these distributional DP methods can or cannot optimize. We describe a number of applications outlining how to use distributional DP to solve different stock-augmented return distribution optimization problems, for example maximizing conditional value-at-risk, and homeostatic regulation. To highlight the practical potential of stock-augmented return distribution optimization and distributional DP, we introduce an agent that combines DQN and the core ideas of distributional DP, and empirically evaluate it for solving instances of the applications discussed.

Author Details

Bernardo Ávila Pires

Author

Mark Rowland

Author

Diana Borsa

Author

Zhaohan Daniel Guo

Author

Khimya Khetarpal

Author

André Barreto

Author

David Abel

Author

Rémi Munos

Author

Will Dabney

Author

Citation Information

APA Format


                                
                                    
                                    Bernardo Ávila Pires
                                
                                    
                                        , 
                                    
                                    Mark Rowland
                                
                                    
                                        , 
                                    
                                    Diana Borsa
                                
                                    
                                        , 
                                    
                                    Zhaohan Daniel Guo
                                
                                    
                                        , 
                                    
                                    Khimya Khetarpal
                                
                                    
                                        , 
                                    
                                    André Barreto
                                
                                    
                                        , 
                                    
                                    David Abel
                                
                                    
                                        , 
                                    
                                    Rémi Munos
                                
                                    
                                         & 
                                    
                                    Will Dabney
                                
                                . 
                                Optimizing Return Distributions with Distributional Dynamic Programming. 
                                Journal of Machine Learning Research
                                .

BibTeX Format


@article{paper470,

  title = { Optimizing Return Distributions with Distributional Dynamic Programming },

  author = { 
                                
                                    Bernardo Ávila Pires
                                
                                     and Mark Rowland
                                
                                     and Diana Borsa
                                
                                     and Zhaohan Daniel Guo
                                
                                     and Khimya Khetarpal
                                
                                     and André Barreto
                                
                                     and David Abel
                                
                                     and Rémi Munos
                                
                                     and Will Dabney
                                
                                },

  journal = { Journal of Machine Learning Research },



  url = { https://www.jmlr.org/papers/v26/25-0210.html }

}

Back to Papers

View Full Paper More from JMLR