반응형
MambaVoiceCloning: Efficient and Expressive Text-to-Speech via State-Space Modeling and Diffusion ControlDiffusion-based Text-to-Speech에 State-Space Model을 도입할 수 있음MamabaVoiceCloningGated bidirectional Mamba text encoder, temporal Bi-Mamba, expressive Mamba를 combine 하여 linear-time $\mathcal{O}(T)$ conditioning을 제공추론 시에는 fixed mel-diffusion-vocoder backbone하에서 attention-based duration, style modu..
Paper/TTS
2026. 4. 16. 13:07
반응형
