Hien Dang, Pratik Patil, Alessandro Rinaldo
This paper demonstrates that self-distillation can significantly improve ridge regression performance by optimally mixing teacher predictions, providing precise asymptotic analyses and a practical one-shot tuning method.
Self-distillation is a technique where a model (the student) is retrained using a combination of actual data labels and predictions made by the same model (the teacher). This study explores how self-distillation can enhance ridge regression, a common statistical method, by adjusting how much weight is given to the teacher's predictions. The researchers found that this approach can consistently improve the model's predictions, even in challenging scenarios. They also developed a practical method to determine the best way to mix these predictions without extensive trial and error, which was validated using real-world data.