JRSSB Apr 09, 2026

Modelling with categorical features via exact fusion and sparsity regularization

Authors
Kayhan Behdin Rahul Mazumder Riade Benbaki Peter Radchenko
Research Topics
High-Dimensional Statistics
Paper Information
  • Journal:
    Journal of the Royal Statistical Society Series B
  • DOI:
    10.1093/jrsssb/qkag062
  • Published:
    April 09, 2026
  • Added to Tracker:
    Apr 13, 2026
Abstract

Abstract We study the high-dimensional linear regression problem with categorical predictors that have many levels. We propose a new estimation approach, which performs model compression via two mechanisms by simultaneously encouraging (a) clustering of the regression coefficients to collapse some of the categorical levels together; and (b) sparsity of the regression coefficients. We present novel mixed integer programming formulations for our estimator, and develop a custom row generation procedure to speed up the exact off-the-shelf solvers. We also propose a fast approximate algorithm for our method that obtains high-quality feasible solutions via block coordinate descent. As the main building block of our algorithm, we develop an exact algorithm for the univariate case based on dynamic programming, which can be of independent interest. We establish new theoretical guarantees for both the prediction and the cluster recovery performance of our estimator. Our numerical experiments on synthetic and real datasets demonstrate that our proposed estimator tends to outperform the state-of-the-art.

Author Details
Kayhan Behdin
Author
Rahul Mazumder
Author
Riade Benbaki
Author
Peter Radchenko
Author
Research Topics & Keywords
High-Dimensional Statistics
Research Area
Citation Information
APA Format
Kayhan Behdin , Rahul Mazumder , Riade Benbaki & Peter Radchenko (2026) . Modelling with categorical features via exact fusion and sparsity regularization. Journal of the Royal Statistical Society Series B , 10.1093/jrsssb/qkag062.
BibTeX Format
@article{paper1109,
  title = { Modelling with categorical features via exact fusion and sparsity regularization },
  author = { Kayhan Behdin and Rahul Mazumder and Riade Benbaki and Peter Radchenko },
  journal = { Journal of the Royal Statistical Society Series B },
  year = { 2026 },
  doi = { 10.1093/jrsssb/qkag062 },
  url = { https://doi.org/10.1093/jrsssb/qkag062 }
}