반응형
F5E-TTS: Enhancing Speech Synthesis by Aligning Text with Rich Semantic RepresentationsText-to-Speech는 text, speech 간의 semantic alignment에 대한 한계가 있음F5E-TTSPhonetic PosteriorGram의 bottleneck feature를 condition으로 Diffusion Transformer backbone을 학습Shared Vector-Quantized codebook을 사용한 explicit cross-modal regularization을 도입논문 (ICASSP 2026) : Paper Link1. IntroductionText-to-Speech (TTS)는 content co..
Paper/TTS
2026. 5. 12. 13:46
반응형
