Corinna Cortes, Mehryar Mohri, Yutao Zhong
The paper proposes a theoretical framework for modularly training generative models using domain-specific experts and a robust gating mechanism, showing this approach can outperform traditional monolithic models.
Training large generative models is often expensive and requires careful tuning of how data is weighted. This study explores whether we can train these models in a modular fashion, using smaller, specialized components that work together, rather than one large model. The authors developed a theoretical framework that combines these smaller models using a 'gate' to select the best one for a given task, aiming to eliminate the need for manual tuning. Their results suggest that this modular approach can not only match but sometimes exceed the performance of traditional, single large models.