반응형

UMETTS: A Unified Framework for Emotional Text-to-Speech Synthesis with Multimodal PromptsEmotional Text-to-Speech (TTS)는 oversimplified emotional label이나 single-modality input에 의존하므로 human emotion을 효과적으로 반영하지 못함UMETTSEmotion Prompt Alignment module과 Emotion Embedding-Induced TTS module을 활용하여 multiple modality의 emotional cue를 반영Emotion Prompt Alignment module은 contrastive learning을 통해 text, audi..
Paper/TTS
2025. 4. 3. 19:51
반응형