JMLR

Optimal subsampling for high-dimensional partially linear models via machine learning methods

Authors

Lei Wang Heng Lian Yujing Shao Haiying Wang

Research Topics

Machine Learning High-Dimensional Statistics

View Full Paper

Paper Information

Journal:
Journal of Machine Learning Research
Added to Tracker:
Dec 30, 2025

Abstract

In this paper, we explore optimal subsampling strategies for estimating the parametric regression coefficients in partially linear models with unknown nuisance functions involving high-dimensional and potentially endogenous covariates. To address model misspecifications and the curse of dimensionality, we leverage flexible machine learning (ML) techniques to estimate the unknown nuisance functions. By constructing an unbiased subsampling Neyman-orthogonal score function, we eliminate regularization bias. A two-step algorithm is then used to obtain appropriate ML estimators of the nuisance functions, mitigating the risk of over-fitting. Using martingale techniques, we establish the unconditional consistency and asymptotic normality of the subsample estimators. Furthermore, we derive optimal subsampling probabilities, including A-optimal and L-optimal probabilities as special cases. The proposed optimal subsampling approach is extended to partially linear instrumental variable models to account for potential endogeneity through instrumental variables. Simulation studies and an empirical analysis of the Physicochemical Properties of Protein Tertiary Structure dataset demonstrate the superior performance of our subsample estimators.

Author Details

Lei Wang

Author

Heng Lian

Author

Yujing Shao

Author

Haiying Wang

Author

Research Topics & Keywords

Machine Learning

Research Area

High-Dimensional Statistics

Research Area

Citation Information

APA Format


                                
                                    
                                    Lei Wang
                                
                                    
                                        , 
                                    
                                    Heng Lian
                                
                                    
                                        , 
                                    
                                    Yujing Shao
                                
                                    
                                         & 
                                    
                                    Haiying Wang
                                
                                . 
                                Optimal subsampling for high-dimensional partially linear models via machine learning methods. 
                                Journal of Machine Learning Research
                                .

BibTeX Format


@article{paper732,

  title = { Optimal subsampling for high-dimensional partially linear models via machine learning methods },

  author = { 
                                
                                    Lei Wang
                                
                                     and Heng Lian
                                
                                     and Yujing Shao
                                
                                     and Haiying Wang
                                
                                },

  journal = { Journal of Machine Learning Research },



  url = { https://www.jmlr.org/papers/v26/23-1475.html }

}

Back to Papers

View Full Paper More from JMLR