Zhuohao Yu, Jiali Zeng, Weizheng Gu, Yidong Wang, Jindong Wang, Fandong Meng, Jie Zhou, Yue Zhang, Shikun Zhang, Wei Ye
RewardAnything introduces a novel reward model that follows natural language principles, enabling better adaptability to diverse tasks without retraining.
The paper addresses the limitations of current reward models used to optimize large language models, which are often rigid and require retraining for different tasks. This is because they are typically trained on fixed datasets that reflect a narrow set of preferences. The authors propose a new type of reward model, called RewardAnything, which can understand and follow natural language instructions, allowing it to adapt to different tasks and principles more flexibly. This approach not only achieves state-of-the-art results but also integrates well with existing methods for aligning language models with human values.