Bowen Ping, Zijun Chen, Yiyao Yu, Tingfeng Hui, Junchi Yan, Baobao Chang
LongR is a framework that improves long-context reasoning in reinforcement learning by using a dynamic 'Think-and-Read' mechanism and dense utility rewards, achieving significant gains on benchmarks like LongBench v2.
The paper introduces LongR, a new approach to improve how artificial intelligence systems understand and reason through long pieces of information. This is particularly useful in situations like long conversations or analyzing complex data sets. Traditional methods often use simple rewards to guide learning, but these aren't effective for complex reasoning tasks. LongR enhances this process by weaving together reasoning and document review, and by using a new type of reward that better measures the usefulness of information. This approach has shown significant improvements in performance on several tests.