Jingyi Zhang, Tianyi Lin, Huanjin Yao, Xiang Lan, Shunyu Liu, Jiaxing Huang
The study introduces CADS, a method for generating high-quality synthetic multimodal data to improve multimodal large language models, resulting in the creation of MMSynthetic-20K and the high-performing R1-SyntheticVL model.
This research explores how to create synthetic data that can improve multimodal large language models, which are systems that understand and generate content across different formats like text and images. The authors introduce a new method called Collective Adversarial Data Synthesis (CADS) that uses collective intelligence and adversarial learning to produce diverse and challenging data. By doing so, they hope to enhance the models' ability to tackle complex tasks. The outcome is a new dataset, MMSynthetic-20K, and a model named R1-SyntheticVL, which has shown to perform well on various tests.