JMLR

Optimization and Generalization of Gradient Descent for Shallow ReLU Networks with Minimal Width

Authors

Ding-Xuan Zhou Yunwen Lei Puyu Wang Yiming Ying

Research Topics

Computational Statistics

View Full Paper

Paper Information

Journal:
Journal of Machine Learning Research
Added to Tracker:
Mar 03, 2026

Abstract

Understanding the generalization and optimization of neural networks is a longstanding problem in modern learning theory. The prior analysis often leads to risk bounds of order $1/\sqrt{n}$ for ReLU networks, where $n$ is the sample size. In this paper, we present a general optimization and generalization analysis for gradient descent applied to shallow ReLU networks. We develop convergence rates of the order $1/T$ for gradient descent with $T$ iterations, and show that the gradient descent iterates fall inside local balls around either an initialization point or a reference point. Then we develop improved Rademacher complexity estimates by using the activation pattern of the ReLU function in these local balls. We apply our general result to NTK-separable data with a margin $\gamma$, and develop an almost optimal risk bound of the order $1/(n\gamma^2)$ for the ReLU network with a polylogarithmic width.

Author Details

Ding-Xuan Zhou

Author

Yunwen Lei

Author

Puyu Wang

Author

Yiming Ying

Author

Research Topics & Keywords

Computational Statistics

Research Area

Citation Information

APA Format


                                
                                    
                                    Ding-Xuan Zhou
                                
                                    
                                        , 
                                    
                                    Yunwen Lei
                                
                                    
                                        , 
                                    
                                    Puyu Wang
                                
                                    
                                         & 
                                    
                                    Yiming Ying
                                
                                . 
                                Optimization and Generalization of Gradient Descent for Shallow ReLU Networks with Minimal Width. 
                                Journal of Machine Learning Research
                                .

BibTeX Format


@article{paper980,

  title = { Optimization and Generalization of Gradient Descent for Shallow ReLU Networks with Minimal Width },

  author = { 
                                
                                    Ding-Xuan Zhou
                                
                                     and Yunwen Lei
                                
                                     and Puyu Wang
                                
                                     and Yiming Ying
                                
                                },

  journal = { Journal of Machine Learning Research },



  url = { https://www.jmlr.org/papers/v27/24-2030.html }

}

Back to Papers

View Full Paper More from JMLR