반응형
MELA-TTS: Joint Transformer-Diffusion Model with Representation Alignment for Speech SynthesisEnd-to-End Text-to-Speech를 위해 joint Transformer-Diffusion framework를 활용할 수 있음MELA-TTSLinguistic, speaker condition으로부터 continuous mel-spectrogram을 autoregressively generateTransformer decoder의 output representation을 pre-trained ASR encoder의 semantic embedding과 align 하는 representation alignment module을 도..
Paper/TTS
2026. 3. 27. 11:10
반응형
