반응형
PFluxTTS: Hybrid Flow-Matching TTS with Robust Cross-Lingual Voice Cloning and Inference-Time Model FusionFlow-matching Text-to-Speech model은 stability-naturalness trade-off, cross-lingual voice cloning의 어려움, low-rate mel-feature에 대한 합성 품질의 한계가 존재함PFluxTTSInference-time vector-field fusion을 통해 duration-guided, alignment-free model을 combine 하는 dual-decoder design을 도입FLUX-based decoder의 speech pro..
Paper/TTS
2026. 2. 26. 13:52
반응형
