Shengtian Yang, Yu Li, Shuo He, Yewen Li, Qingpeng Cai, Peng Jiang, Lei Feng
The paper introduces Phase-Aware Mixture of Experts (PA-MoE) to enhance reinforcement learning by allowing expert specialization for complex tasks without being dominated by simpler tasks.
Reinforcement learning is a method used to train AI agents, like large language models (LLMs), to solve various tasks. However, using a single policy network often leads to a 'simplicity bias,' where simpler tasks take up most of the network's capacity, leaving little room for more complex tasks. To address this, the authors propose a new approach called Phase-Aware Mixture of Experts (PA-MoE). This method uses multiple specialized networks, or 'experts,' each focusing on different tasks. A unique feature of PA-MoE is its 'phase router,' which efficiently assigns tasks to the right expert, ensuring that complex tasks get the attention they need. The experiments show that PA-MoE improves the performance of reinforcement learning agents.