Optimization and Generalization of Gradient Descent for Shallow ReLU Networks with Minimal Width
Authors
Research Topics
Paper Information
-
Journal:
Journal of Machine Learning Research -
Added to Tracker:
Mar 03, 2026
Abstract
Understanding the generalization and optimization of neural networks is a longstanding problem in modern learning theory. The prior analysis often leads to risk bounds of order $1/\sqrt{n}$ for ReLU networks, where $n$ is the sample size. In this paper, we present a general optimization and generalization analysis for gradient descent applied to shallow ReLU networks. We develop convergence rates of the order $1/T$ for gradient descent with $T$ iterations, and show that the gradient descent iterates fall inside local balls around either an initialization point or a reference point. Then we develop improved Rademacher complexity estimates by using the activation pattern of the ReLU function in these local balls. We apply our general result to NTK-separable data with a margin $\gamma$, and develop an almost optimal risk bound of the order $1/(n\gamma^2)$ for the ReLU network with a polylogarithmic width.
Author Details
Ding-Xuan Zhou
AuthorYunwen Lei
AuthorPuyu Wang
AuthorYiming Ying
AuthorResearch Topics & Keywords
Computational Statistics
Research AreaCitation Information
APA Format
Ding-Xuan Zhou
,
Yunwen Lei
,
Puyu Wang
&
Yiming Ying
.
Optimization and Generalization of Gradient Descent for Shallow ReLU Networks with Minimal Width.
Journal of Machine Learning Research
.
BibTeX Format
@article{paper980,
title = { Optimization and Generalization of Gradient Descent for Shallow ReLU Networks with Minimal Width },
author = {
Ding-Xuan Zhou
and Yunwen Lei
and Puyu Wang
and Yiming Ying
},
journal = { Journal of Machine Learning Research },
url = { https://www.jmlr.org/papers/v27/24-2030.html }
}