PaperPulse logo
FeedTopicsAI Researcher FeedBlogPodcastAccount

Stay Updated

Get the latest research delivered to your inbox

Platform

  • Home
  • About Us
  • Search Papers
  • Research Topics
  • Researcher Feed

Resources

  • Newsletter
  • Blog
  • Podcast
PaperPulse•

AI-powered research discovery platform

© 2024 PaperPulse. All rights reserved.

LongR: Unleashing Long-Context Reasoning via Reinforcement Learning with Dense Utility Rewards

ArXivSource

Bowen Ping, Zijun Chen, Yiyao Yu, Tingfeng Hui, Junchi Yan, Baobao Chang

cs.CL
|
Feb 5, 2026
279 views

One-line Summary

LongR is a framework that improves long-context reasoning in reinforcement learning by using a dynamic 'Think-and-Read' mechanism and dense utility rewards, achieving significant gains on benchmarks like LongBench v2.

Plain-language Overview

The paper introduces LongR, a new approach to improve how artificial intelligence systems understand and reason through long pieces of information. This is particularly useful in situations like long conversations or analyzing complex data sets. Traditional methods often use simple rewards to guide learning, but these aren't effective for complex reasoning tasks. LongR enhances this process by weaving together reasoning and document review, and by using a new type of reward that better measures the usefulness of information. This approach has shown significant improvements in performance on several tests.

Technical Details