반응형
![](http://i1.daumcdn.net/thumb/C148x148/?fname=https://blog.kakaocdn.net/dn/cutchS/btsL3QxtA42/Jka0HvFL8g9D0zhDvh4Q5K/img.png)
ProsodyFlow: High-Fidelity Text-to-Speech through Conditional Flow Matching and Prosody Modeling with Large Speech Language ModelsText-to-Speech에서 diverse, natural prosody를 반영하는 것은 여전히 한계가 있음ProsodyFlowLarge self-supervised speech model과 conditional flow matching을 결합해 prosodic feature를 modelingSpeech LLM을 통해 acoustic feature를 추출하고 해당 feature를 prosody latent space에 mapping 한 다음, conditional flow ..
Paper/TTS
2025. 2. 2. 10:31
반응형