반응형

InstantSpeech: Instant Synchronous Text-to-Speech Synthesis for LLM-driven Voice ChatbotsLarge Language Model과 pair 된 text-to-speech model은 entire sentence가 생성될 때까지 synthesis를 수행하지 않으므로 response latency가 증가함InstantSpeechCausal Transformer-based acoustic model과 causal convolution-based vocoder를 combine 한 fully-parallel architecture를 활용Limited lookahead 내에서 speech quality를 향상하기 위해 knowledge distil..
Paper/TTS
2025. 5. 20. 17:49
반응형