반응형
Cross-Lingual F5-TTS: Towards Language-Agnostic Voice Cloning and Speech SynthesisFlow-matching-based Text-to-Speech model은 cross-lingual task에 적용하기 어려움Cross-Lingual F5-TTSForced alignment를 활용하여 audio prompt를 pre-process 해 word boundary를 얻어 audio prompt로부터 direct synthesis를 수행Duration modeling을 위해 다양한 linguistic granularity를 가지는 speaking rate predictor를 도입논문 (ICASSP 2026) : Paper Link1. Introduc..
Paper/TTS
2026. 3. 25. 12:54
반응형
