반응형

MB-iSTFT-VITS: Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse Short-Time Fourier TransformLightweight end-to-end text-to-speech model이 필요함MB-iSTFT-VITSComputationally expensive component를 simple inverse Short-Time Fourier Transform으로 replaceFixed/trainable synthesis filter를 가지는 multi-band generation을 통해 waveform을 생성논문 (ICASSP 2023) : Paper Link1. I..
Paper/TTS
2025. 5. 27. 17:49
반응형