Hongye Cao, Zhixin Bai, Ziyue Peng, Boyan Wang, Tianpei Yang, Jing Huo, Yuyao Zhang, Yang Gao
This paper presents a novel reinforcement learning framework that uses semantic and token entropy to improve reasoning in large language models, outperforming existing methods across multiple benchmarks.
The research introduces a new reinforcement learning approach to enhance the reasoning abilities of large language models (LLMs), which are AI systems that can understand and generate human-like text. Traditional methods often struggle with 'entropy collapse,' a problem that limits the system's ability to explore different reasoning paths. This study proposes a solution by incorporating entropy at both the semantic level (meaning of words) and the token level (individual words or characters) to encourage better exploration and learning. The method organizes learning tasks from simple to complex and applies specific constraints to critical parts of the text, resulting in improved reasoning abilities of the LLMs.