반응형
[Paper 리뷰] Speak, Read and Prompt: High-Fidelity Text-to-Speech with Minimal Supervision
Speak, Read and Prompt: High-Fidelity Text-to-Speech with Minimal SupervisionMinimal supervision으로 train 할 수 있는 multi-speaker text-to-speech model이 필요함SPEAR-TTSText to High level semantic token (Reading), Semantic token to Low-level acoustic token (Speaking)의 2가지 discrete speech representation을 combining 하여 text-to-speech를 sequence-to-sequence task로 casting특히 abundant audio-only data를 사용하여 Speak..
Paper/Language Model
2025. 1. 8. 16:31
반응형