반응형
[Paper 리뷰] ProsodyFlow: High-Fidelity Text-to-Speech through Conditional Flow Matching and Prosody Modeling with Large Speech Language Models
ProsodyFlow: High-Fidelity Text-to-Speech through Conditional Flow Matching and Prosody Modeling with Large Speech Language ModelsText-to-Speech에서 diverse, natural prosody를 반영하는 것은 여전히 한계가 있음ProsodyFlowLarge self-supervised speech model과 conditional flow matching을 결합해 prosodic feature를 modelingSpeech LLM을 통해 acoustic feature를 추출하고 해당 feature를 prosody latent space에 mapping 한 다음, conditional flow ..
Paper/TTS
2025. 2. 2. 10:31
반응형