반응형
ZipVoice: Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching기존의 large-scale text-to-speech model은 massive parameter로 인해 추론 속도가 느림ZipVoiceZipformer-based vector field estimator, text encoder를 도입하고 average upsampling-based initial speech-text alignment를 활용추가적으로 sampling step을 줄이기 위해 flow distillation method를 도입논문 (ASRU 2025) : Paper Link1. IntroductionVALL-E, VoiceBox, MaskGCT와 같은 z..
Paper/TTS
2025. 12. 11. 13:17
반응형
