반응형
DMOSpeech2: Reinforcement Learning for Duration Prediction in Metric-Optimized Speech SynthesisDiffusion-based Text-to-Speech의 component를 perceptual metric에 optimize 하는 것은 어려움DMOSpeech2Speaker similarity와 Word Error Rate를 reward로 사용하는 Group Relative Preference Optimization을 적용추가적으로 teacher-guided sampling을 통해 output diversity를 향상논문 (AAAI 2026) : Paper Link1. IntroductionNaturalSpeech, StyleTTS2와..
Paper/TTS
2026. 4. 3. 13:27
반응형
