PaperPulse logo
FeedTopicsAI Researcher FeedBlogPodcastAccount

Stay Updated

Get the latest research delivered to your inbox

Platform

  • Home
  • About Us
  • Search Papers
  • Research Topics
  • Researcher Feed

Resources

  • Newsletter
  • Blog
  • Podcast
PaperPulse•

AI-powered research discovery platform

© 2024 PaperPulse. All rights reserved.

Beyond Variance: Prompt-Efficient RLVR via Rare-Event Amplification and Bidirectional Pairing

ArXivSource

Xin Sheng, Jiaxin Li, Yujuan Pang, Ran Peng, Yong Ma

cs.LG
cs.AI
|
Feb 3, 2026
451 views

One-line Summary

The paper introduces a method called positive-negative pairing for prompt selection in reinforcement learning with verifiable rewards, leading to improved performance on deterministic reasoning tasks by amplifying rare event signals.

Plain-language Overview

This study explores how to better train large language models using a method called reinforcement learning with verifiable rewards (RLVR), which helps the models understand and reason through tasks with clear outcomes. The authors propose a new approach to selecting prompts, focusing on pairing a challenging but solvable prompt with an easier one that still has occasional failures. This pairing helps the model learn more effectively by emphasizing successes and failures, leading to improved performance. The results show that this method outperforms traditional approaches, even when using fewer prompts.

Technical Details