Gongyu Chen, Xiaoyu Zhang, Zhenqiang Weng, Junjie Zheng, Da Shen, Chaofan Ding, Wei-Qiang Zhang, Zihao Chen
YingMusic-SVC is a robust zero-shot singing voice conversion system that improves timbre similarity and naturalness in real-world conditions using innovative techniques like Flow-GRPO and singing-specific biases.
The paper presents YingMusic-SVC, a new system designed to convert singing voices from one singer to another while maintaining the original melody and lyrics. Unlike previous systems, YingMusic-SVC is more resilient to challenges like background harmonies and pitch errors, which often disrupt voice conversion. The system uses advanced machine learning techniques to separate and adapt different aspects of the voice, ensuring the converted voice sounds natural and similar to the target singer. Tests show that YingMusic-SVC performs better than existing methods, especially in complex musical environments.