JMLR

Contextual Bandits with Stage-wise Constraints

Authors
Aldo Pacchiano Mohammad Ghavamzadeh Peter Bartlett
Research Topics
Machine Learning
Paper Information
  • Journal:
    Journal of Machine Learning Research
  • Added to Tracker:
    Sep 08, 2025
Abstract

We study contextual bandits in the presence of a stage-wise constraint when the constraint must be satisfied both with high probability and in expectation. We start with the linear case where both the reward function and the stage-wise constraint (cost function) are linear. In each of the high probability and in expectation settings, we propose an upper-confidence bound algorithm for the problem and prove a $T$-round regret bound for it. We also prove a lower-bound for this constrained problem, show how our algorithms and analyses can be extended to multiple constraints, and provide simulations to validate our theoretical results. In the high probability setting, we describe the minimum requirements for the action set for our algorithm to be tractable. In the setting that the constraint is in expectation, we specialize our results to multi-armed bandits and propose a computationally efficient algorithm for this setting with regret analysis. Finally, we extend our results to the case where the reward and cost functions are both non-linear. We propose an algorithm for this case and prove a regret bound for it that characterize the function class complexity by the eluder dimension.

Author Details
Aldo Pacchiano
Author
Mohammad Ghavamzadeh
Author
Peter Bartlett
Author
Research Topics & Keywords
Machine Learning
Research Area
Citation Information
APA Format
Aldo Pacchiano , Mohammad Ghavamzadeh & Peter Bartlett . Contextual Bandits with Stage-wise Constraints. Journal of Machine Learning Research .
BibTeX Format
@article{paper485,
  title = { Contextual Bandits with Stage-wise Constraints },
  author = { Aldo Pacchiano and Mohammad Ghavamzadeh and Peter Bartlett },
  journal = { Journal of Machine Learning Research },
  url = { https://www.jmlr.org/papers/v26/24-0267.html }
}