JMLR

Optimization and Generalization of Gradient Descent for Shallow ReLU Networks with Minimal Width

Authors
Ding-Xuan Zhou Yunwen Lei Puyu Wang Yiming Ying
Research Topics
Computational Statistics
Paper Information
  • Journal:
    Journal of Machine Learning Research
  • Added to Tracker:
    Mar 03, 2026
Abstract

Understanding the generalization and optimization of neural networks is a longstanding problem in modern learning theory. The prior analysis often leads to risk bounds of order $1/\sqrt{n}$ for ReLU networks, where $n$ is the sample size. In this paper, we present a general optimization and generalization analysis for gradient descent applied to shallow ReLU networks. We develop convergence rates of the order $1/T$ for gradient descent with $T$ iterations, and show that the gradient descent iterates fall inside local balls around either an initialization point or a reference point. Then we develop improved Rademacher complexity estimates by using the activation pattern of the ReLU function in these local balls. We apply our general result to NTK-separable data with a margin $\gamma$, and develop an almost optimal risk bound of the order $1/(n\gamma^2)$ for the ReLU network with a polylogarithmic width.

Author Details
Ding-Xuan Zhou
Author
Yunwen Lei
Author
Puyu Wang
Author
Yiming Ying
Author
Research Topics & Keywords
Computational Statistics
Research Area
Citation Information
APA Format
Ding-Xuan Zhou , Yunwen Lei , Puyu Wang & Yiming Ying . Optimization and Generalization of Gradient Descent for Shallow ReLU Networks with Minimal Width. Journal of Machine Learning Research .
BibTeX Format
@article{paper980,
  title = { Optimization and Generalization of Gradient Descent for Shallow ReLU Networks with Minimal Width },
  author = { Ding-Xuan Zhou and Yunwen Lei and Puyu Wang and Yiming Ying },
  journal = { Journal of Machine Learning Research },
  url = { https://www.jmlr.org/papers/v27/24-2030.html }
}