반응형
SyncSpeech: Efficient and Low-Latency Text-to-Speech based on Temporal Masked TransformerText-to-Speech model은 여전히 latency의 한계가 있음SyncSpeechTemporal Mask Transformer를 기반으로 autoregressive model의 temporally ordered generation과 non-autoregressive model의 parallel decoding을 unify추가적으로 High-Probability Masking을 통해 training efficiency를 향상논문 (ICASSP 2026) : Paper Link1. IntroductionText-to-Speech (TTS)는..
Paper/TTS
2026. 5. 19. 15:18
반응형