JMLR

A Unified Analysis of Nonstochastic Delayed Feedback for Combinatorial Semi-Bandits, Linear Bandits, and MDPs

Authors
Lukas Zierahn Dirk van der Hoeven Tal Lancewicki Aviv Rosenberg Nicolò Cesa-Bianchi
Paper Information
  • Journal:
    Journal of Machine Learning Research
  • Added to Tracker:
    Jul 15, 2025
Abstract

We derive a new analysis of Follow The Regularized Leader (FTRL) for online learning with delayed bandit feedback. By separating the cost of delayed feedback from that of bandit feedback, our analysis allows us to obtain new results in four important settings. We derive the first optimal (up to logarithmic factors) regret bounds for combinatorial semi-bandits with delay and adversarial Markov Decision Processes with delay (both known and unknown transition functions). Furthermore, we use our analysis to develop an efficient algorithm for linear bandits with delay achieving near-optimal regret bounds. In order to derive these results we show that FTRL remains stable across multiple rounds under mild assumptions on the regularizer.

Author Details
Lukas Zierahn
Author
Dirk van der Hoeven
Author
Tal Lancewicki
Author
Aviv Rosenberg
Author
Nicolò Cesa-Bianchi
Author
Citation Information
APA Format
Lukas Zierahn , Dirk van der Hoeven , Tal Lancewicki , Aviv Rosenberg & Nicolò Cesa-Bianchi . A Unified Analysis of Nonstochastic Delayed Feedback for Combinatorial Semi-Bandits, Linear Bandits, and MDPs. Journal of Machine Learning Research .
BibTeX Format
@article{JMLR:v26:24-0496,
  author  = {Lukas Zierahn and Dirk van der Hoeven and Tal Lancewicki and Aviv Rosenberg and Nicol{{\`o}} Cesa-Bianchi},
  title   = {A Unified Analysis of Nonstochastic Delayed Feedback for Combinatorial Semi-Bandits, Linear Bandits, and MDPs},
  journal = {Journal of Machine Learning Research},
  year    = {2025},
  volume  = {26},
  number  = {104},
  pages   = {1--60},
  url     = {http://jmlr.org/papers/v26/24-0496.html}
}
Related Papers