JMLR

Revisiting Gradient Normalization and Clipping for Nonconvex SGD under Heavy-Tailed Noise: Necessity, Sufficiency, and Acceleration

Authors
Kun Yuan Tao Sun Xinwang Liu
Research Topics
Machine Learning
Paper Information
  • Journal:
    Journal of Machine Learning Research
  • Added to Tracker:
    Dec 30, 2025
Abstract

Gradient clipping has long been considered essential for ensuring the convergence of Stochastic Gradient Descent (SGD) in the presence of heavy-tailed gradient noise. In this paper, we revisit this belief and explore whether gradient normalization can serve as an effective alternative or complement. We prove that, under individual smoothness assumptions, gradient normalization alone is sufficient to guarantee convergence of the nonconvex SGD. Moreover, when combined with clipping, it yields far better rates of convergence under more challenging noise distributions. We provide a unifying theory describing normalization-only, clipping-only, and combined approaches. Moving forward, we investigate existing variance-reduced algorithms, establishing that, in such a setting, normalization alone is sufficient for convergence. Finally, we present an accelerated variant that under second-order smoothness improves convergence. Our results provide theoretical insights and practical guidance for using normalization and clipping in nonconvex optimization with heavy-tailed noise.

Author Details
Kun Yuan
Author
Tao Sun
Author
Xinwang Liu
Author
Research Topics & Keywords
Machine Learning
Research Area
Citation Information
APA Format
Kun Yuan , Tao Sun & Xinwang Liu . Revisiting Gradient Normalization and Clipping for Nonconvex SGD under Heavy-Tailed Noise: Necessity, Sufficiency, and Acceleration. Journal of Machine Learning Research .
BibTeX Format
@article{paper693,
  title = { Revisiting Gradient Normalization and Clipping for Nonconvex SGD under Heavy-Tailed Noise: Necessity, Sufficiency, and Acceleration },
  author = { Kun Yuan and Tao Sun and Xinwang Liu },
  journal = { Journal of Machine Learning Research },
  url = { https://www.jmlr.org/papers/v26/24-1991.html }
}