Xuan-Phi Nguyen, Shrey Pandit, Revanth Gangi Reddy, Austin Xu, Silvio Savarese, Caiming Xiong, Shafiq Joty
The paper introduces SFR-DeepResearch, a reinforcement learning approach to enhance single-agent autonomous reasoning in language models for deep research tasks, achieving notable results on a benchmark test.
This research focuses on improving how large language models can reason and use tools autonomously, which is important for applications like deep research that involve extensive searching and reasoning. Unlike systems where multiple agents have predefined roles, this study explores a single-agent model that decides what to do next based on the situation. The researchers developed a reinforcement learning method to train these models using synthetic data, which helps improve their reasoning capabilities. Their best model, SFR-DR-20B, showed promising results on a challenging benchmark test.