Seymanur Akti, Tuan Nam Nguyen, Alexander Waibel
The study improves expressive voice conversion by enhancing style transfer and reducing source timbre leakage using a non-autoregressive framework with a conditional variational autoencoder.
This research focuses on improving technology that can change the voice in a recording to sound like someone else while also adopting the same emotional tone. The authors developed a method that better separates the original voice's characteristics from the desired new voice and style. They achieved this by using advanced techniques to represent the content and style of speech separately, making the voice conversion more accurate and expressive. The results show that their method is better at transferring emotions and speaker identity compared to previous approaches.