PaperPulse logo
FeedTopicsAI Researcher FeedBlogPodcastAccount

Stay Updated

Get the latest research delivered to your inbox

Platform

  • Home
  • About Us
  • Search Papers
  • Research Topics
  • Researcher Feed

Resources

  • Newsletter
  • Blog
  • Podcast
PaperPulse•

AI-powered research discovery platform

© 2024 PaperPulse. All rights reserved.

Anytime Pretraining: Horizon-Free Learning-Rate Schedules with Weight Averaging

ArXivSource

Alexandru Meterez, Pranav Ajit Nair, Depen Morwani, Cengiz Pehlevan, Sham Kakade

cs.LG
cs.AI
math.OC
stat.ML
|
Feb 3, 2026
5 views

One-line Summary

This paper introduces anytime pretraining schedules using weight averaging, which provide effective learning rate strategies for language models without needing a fixed training horizon.

Plain-language Overview

In the world of artificial intelligence, training large language models is a complex task often requiring precise planning and tuning of learning rates based on a predetermined training duration. However, this research explores a new approach that doesn't rely on knowing how long the training will last. The study highlights the importance of weight averaging—a technique where the model parameters are averaged over time—to achieve efficient learning. The findings show that these new, flexible learning schedules can perform just as well as traditional methods, offering a simpler and effective way to train models without the need for a fixed timeline.

Technical Details