Junjie Zheng, Chunbo Hao, Guobin Ma, Xiaoyu Zhang, Gongyu Chen, Chaofan Ding, Zihao Chen, Lei Xie
YingMusic-Singer introduces a novel melody-driven singing voice synthesis framework that operates without manual phoneme alignment or melody annotation, improving scalability and performance in zero-shot settings.
Singing Voice Synthesis (SVS) is a technology that generates singing from text and melody, but it often requires detailed manual annotations, making it hard to scale. YingMusic-Singer changes this by using a new method that can create singing from any lyrics and melody without needing such annotations. This is achieved through a combination of advanced machine learning techniques that allow the system to learn directly from audio examples. The result is a more flexible and efficient way to synthesize singing voices, which works well even with new melodies and lyrics.