PaperPulse logo
FeedTopicsAI Researcher FeedBlogPodcastAccount

Stay Updated

Get the latest research delivered to your inbox

Platform

  • Home
  • About Us
  • Search Papers
  • Research Topics
  • Researcher Feed

Resources

  • Newsletter
  • Blog
  • Podcast
PaperPulse•

AI-powered research discovery platform

© 2024 PaperPulse. All rights reserved.

Logarithmic-time Schedules for Scaling Language Models with Momentum

ArXivSource

Damien Ferbach, Courtney Paquette, Gauthier Gidel, Katie Everett, Elliot Paquette

stat.ML
cs.LG
math.OC
|
Feb 5, 2026
3 views

One-line Summary

ADANA, an optimizer with time-varying schedules for hyperparameters, improves large-scale language model training efficiency by up to 40% compared to AdamW.

Plain-language Overview

In the realm of training large language models, the choice of hyperparameters is crucial for performance. Traditionally, certain parameters in the AdamW optimizer are kept constant, but this research suggests that changing them over time can lead to better results. By using a method called logarithmic-time scheduling, the researchers developed a new optimizer named ADANA. This optimizer adjusts its settings as training progresses, resulting in faster and more efficient training of language models, particularly as they become larger.

Technical Details