JMLR

A Unified Analysis of Nonstochastic Delayed Feedback for Combinatorial Semi-Bandits, Linear Bandits, and MDPs

Authors

Lukas Zierahn Dirk van der Hoeven Tal Lancewicki Aviv Rosenberg Nicolò Cesa-Bianchi

Paper Information

Journal:
Journal of Machine Learning Research
Added to Tracker:
Jul 15, 2025

Abstract

We derive a new analysis of Follow The Regularized Leader (FTRL) for online learning with delayed bandit feedback. By separating the cost of delayed feedback from that of bandit feedback, our analysis allows us to obtain new results in four important settings. We derive the first optimal (up to logarithmic factors) regret bounds for combinatorial semi-bandits with delay and adversarial Markov Decision Processes with delay (both known and unknown transition functions). Furthermore, we use our analysis to develop an efficient algorithm for linear bandits with delay achieving near-optimal regret bounds. In order to derive these results we show that FTRL remains stable across multiple rounds under mild assumptions on the regularizer.

Author Details

Lukas Zierahn

Author

Dirk van der Hoeven

Author

Tal Lancewicki

Author

Aviv Rosenberg

Author

Nicolò Cesa-Bianchi

Author

Citation Information

APA Format


                                
                                    
                                    Lukas Zierahn
                                
                                    
                                        , 
                                    
                                    Dirk van der Hoeven
                                
                                    
                                        , 
                                    
                                    Tal Lancewicki
                                
                                    
                                        , 
                                    
                                    Aviv Rosenberg
                                
                                    
                                         & 
                                    
                                    Nicolò Cesa-Bianchi
                                
                                . 
                                A Unified Analysis of Nonstochastic Delayed Feedback for Combinatorial Semi-Bandits, Linear Bandits, and MDPs. 
                                Journal of Machine Learning Research
                                .

BibTeX Format

@article{JMLR:v26:24-0496,
  author  = {Lukas Zierahn and Dirk van der Hoeven and Tal Lancewicki and Aviv Rosenberg and Nicol{{\`o}} Cesa-Bianchi},
  title   = {A Unified Analysis of Nonstochastic Delayed Feedback for Combinatorial Semi-Bandits, Linear Bandits, and MDPs},
  journal = {Journal of Machine Learning Research},
  year    = {2025},
  volume  = {26},
  number  = {104},
  pages   = {1--60},
  url     = {http://jmlr.org/papers/v26/24-0496.html}
}

Back to Papers

View Full Paper More from JMLR

A Unified Analysis of Nonstochastic Delayed Feedback for Combinatorial Semi-Bandits, Linear Bandits, and MDPs

Authors

Paper Information

Abstract

Author Details

Lukas Zierahn

Dirk van der Hoeven

Tal Lancewicki

Aviv Rosenberg

Nicolò Cesa-Bianchi

Citation Information

APA Format

BibTeX Format

Related Papers