Yubo Huang, Weiqiang Wang, Sirui Zhao, Tong Xu, Lin Liu, Enhong Chen
Bind-Your-Avatar introduces a novel framework for generating videos with multiple talking characters in the same scene, using a dynamic 3D-mask embedding router to control audio-to-character correspondence and a new dataset for training and benchmarking.
This research introduces a new approach to creating videos where multiple characters talk together in the same scene. Traditional methods often only handle one character at a time or separate conversations between two characters. The authors developed a system called Bind-Your-Avatar, which uses advanced techniques to ensure each character's speech matches their audio correctly. They also created a new dataset specifically for training these multi-character videos, allowing for better performance compared to existing methods.