Sumedh Rasal
Predictive Batch Scheduling (PBS) accelerates language model training by prioritizing high-loss samples using a lightweight predictor based on token-level features.
This paper introduces a new method called Predictive Batch Scheduling (PBS) to speed up the training of language models. PBS works by focusing on training with the most challenging data samples, which are identified by a simple predictor. This predictor uses basic features like how often words appear and the length of the text to estimate which samples are harder and should be prioritized. The technique results in faster training times, making it a promising approach for improving the efficiency of language model development.