PaperPulse logo
FeedTopicsAI Researcher FeedBlogPodcastAccount

Stay Updated

Get the latest research delivered to your inbox

Platform

  • Home
  • About Us
  • Search Papers
  • Research Topics
  • Researcher Feed

Resources

  • Newsletter
  • Blog
  • Podcast
PaperPulse•

AI-powered research discovery platform

© 2024 PaperPulse. All rights reserved.

Arithmetic-Mean $μ$P for Modern Architectures: A Unified Learning-Rate Scale for CNNs and ResNets

arXivSource

Haosong Zhang, Shenxi Wu, Yichi Zhang, Wei Lin

stat.ML
|
Oct 5, 2025
5 views

One-line Summary

Arithmetic-Mean $μ$P provides a unified learning-rate scaling method for CNNs and ResNets, enabling consistent performance across varying network depths.

Plain-language Overview

Choosing the right learning rate is crucial for training deep neural networks effectively, especially as they get deeper and more complex. Traditional methods struggle with modern architectures like convolutional and residual networks due to their layer-specific imbalances. This paper introduces a new approach called Arithmetic-Mean $μ$P, which focuses on maintaining a consistent average update across the entire network rather than individual layers. This method allows for a reliable learning rate that adapts well as network depth changes, simplifying the training process and improving performance without the need for extensive tuning.

Technical Details