Purbesh Mitra, Sennur Ulukus
Semantic Soft Bootstrapping improves long context reasoning in language models by using a self-distillation technique without requiring reinforcement learning, resulting in significant accuracy improvements on math benchmarks.
This research introduces a new method called Semantic Soft Bootstrapping (SSB) to enhance the reasoning abilities of large language models, like those used in solving math problems. Traditional methods often rely on reinforcement learning, which can be resource-intensive and inefficient. Instead, SSB allows the model to learn from itself by generating multiple solutions to a problem and using these to create a more accurate final answer. This approach improves the model's performance without the need for complex reinforcement learning techniques, achieving notable accuracy gains on standard math tests.