반응형
[Paper 리뷰] ZET-Speech: Zero-Shot Adaptive Emotion-Controllable Text-to-Speech with Diffusion and Style-based Models
ZET-Speech: Zero-Shot Adaptive Emotion-Controllable Text-to-Speech Synthesis with Diffusion and Style-based Models Emotional Text-to-Speech는 natural 하고 emotional한 음성을 합성할 수 있음 BUT, 기존 방식들은 unseen speaker에 대한 generalization 없이 seen speaker만을 대상으로 함 ZET-Speech 짧은 speech segment와 target emotion label을 사용하여 any-speaker zero-shot adaptive text-to-speech 수행 Zero-shot adaptive model이 emotional speech를 ..
Paper/TTS
2024. 3. 25. 10:13
반응형