반응형
[Paper 리뷰] ED-TTS: Multi-Scale Emotion Modeling using Cross-Domain Emotion Diarization for Emotional Speech Synthesis
ED-TTS: Multi-Scale Emotion Modeling using Cross-Domain Emotion Diarization for Emotional Speech Synthesis기존의 emotional speech synthesis는 reference audio에서 추출된 utterance-level style embedding을 활용하기 때문에 speech prosody의 multi-scale property를 neglecting 하는 경우가 많음ED-TTSSpeech Emotion Diarization (SED)과 Speech Emotion Recognition (SER)을 활용하여 multi-scale에서 emotion을 모델링SER에서 추출한 utterance-level emotion..
Paper/TTS
2024. 5. 14. 09:57
반응형