JMLR

Score-Aware Policy-Gradient and Performance Guarantees using Local Lyapunov Stability

Authors

Céline Comte Matthieu Jonckheere Jaron Sanders Albert Senen-Cerda

View Full Paper

Paper Information

Journal:
Journal of Machine Learning Research
Added to Tracker:
Sep 08, 2025

Abstract

In this paper, we introduce a policy-gradient method for model-based reinforcement learning (RL) that exploits a type of stationary distributions commonly obtained from Markov decision processes (MDPs) in stochastic networks, queueing systems, and statistical mechanics. Specifically, when the stationary distribution of the MDP belongs to an exponential family that is parametrized by policy parameters, we can improve existing policy gradient methods for average-reward RL. Our key identification is a family of gradient estimators, called score-aware gradient estimators (SAGEs), that enable policy gradient estimation without relying on value-function estimation in the aforementioned setting. We show that SAGE-based policy-gradient locally converges, and we obtain its regret. This includes cases when the state space of the MDP is countable and unstable policies can exist. Under appropriate assumptions such as starting sufficiently close to a maximizer and the existence of a local Lyapunov function, the policy under SAGE-based stochastic gradient ascent has an overwhelming probability of converging to the associated optimal policy. Furthermore, we conduct a numerical comparison between a SAGE-based policy-gradient method and an actor-critic method on several examples inspired from stochastic networks, queueing systems, and models derived from statistical physics. Our results demonstrate that a SAGE-based method finds close-to-optimal policies faster than an actor-critic method.

Author Details

Céline Comte

Author

Matthieu Jonckheere

Author

Jaron Sanders

Author

Albert Senen-Cerda

Author

Citation Information

APA Format


                                
                                    
                                    Céline Comte
                                
                                    
                                        , 
                                    
                                    Matthieu Jonckheere
                                
                                    
                                        , 
                                    
                                    Jaron Sanders
                                
                                    
                                         & 
                                    
                                    Albert Senen-Cerda
                                
                                . 
                                Score-Aware Policy-Gradient and Performance Guarantees using Local Lyapunov Stability. 
                                Journal of Machine Learning Research
                                .

BibTeX Format


@article{paper523,

  title = { Score-Aware Policy-Gradient and Performance Guarantees using Local Lyapunov Stability },

  author = { 
                                
                                    Céline Comte
                                
                                     and Matthieu Jonckheere
                                
                                     and Jaron Sanders
                                
                                     and Albert Senen-Cerda
                                
                                },

  journal = { Journal of Machine Learning Research },



  url = { https://www.jmlr.org/papers/v26/24-1009.html }

}

Back to Papers

View Full Paper More from JMLR