반응형
VoXtream: Full-Stream Text-to-Speech with Extremely Low LatencyReal-time zero-shot streaming text-to-speech model이 필요함VoXtreamLimited look-ahead를 사용하여 incoming phoneme을 audio token으로 directly mapping구조적으로는 incremental phoneme transformer, temporal transformer, depth transformer를 활용논문 (ICASSP 2026) : Paper Link1. IntroductionLow-latency streaming Text-to-Speech (TTS)를 위해서는 first-packet latency를 m..
Paper/TTS
2026. 3. 23. 10:23
반응형
