Tight Long-Term Tail Decay of (Clipped) SGD in Non-Convex Optimization
Aleksandar Armacki, Dragana Bajović et al.
TLDR: This paper establishes tight long-term tail decay rates for SGD and clipped SGD in non-convex optimization, showing significantly faster decay than previously known results.