Zixiang Di, Jinyi Han, Shuo Zhang, Ying Liao, Zhi Li, Xiaofeng Ji, Yongqi Wang, Zheming Yang, Ming Gao, Bingdong Li, Jie Wang
Plausible Negative Samples (PNS) improve the reasoning capabilities of Large Language Models by generating high-quality incorrect responses for training purposes.
Large Language Models (LLMs) can become better at reasoning by learning from incorrect examples, but not all wrong answers are equally helpful. The new method, Plausible Negative Samples (PNS), creates sophisticated incorrect responses that look and feel like correct ones, except for the final answer. This approach helps LLMs learn more effectively by focusing on the quality of the negative examples. Testing on several mathematical reasoning tasks shows that this method improves model performance significantly compared to other techniques.