'2025/05/03 글 목록

[Paper 리뷰] NaturalSpeech2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers

NaturalSpeech2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers기존의 large-scale text-to-speech system은 speech를 discrete token으로 quantize 하고 language model을 기반으로 해당 token을 처리함- 따라서 unstable prosody, word skipping/repeating 등의 문제가 발생함NaturalSpeech2Quantized latent vector를 얻기 위해 residual vector quantizer에 기반한 neural audio codec을 활용이후 diffusion model을 활용하여 text input..

Paper/TTS 2025. 5. 3. 09:37

이전 1 다음

이전 다음

최근에 올라온 글

최근에 달린 댓글

« 2025/05 »
일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Total

Today

Yesterday

Let IT Begin

티스토리툴바