'Paper' 카테고리의 글 목록 (12 Page)

[Paper 리뷰] NaturalSpeech3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

NaturalSpeech3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion ModelsLarge-scale text-to-speech system은 여전히 prosody, similarity 측면에서 한계가 있음NaturalSpeech3Speech waveform을 content, prosody, timbre, acoustic detail의 subspace로 disentangle 하는 Factorized Vector Quantization에 기반한 neural codec을 활용Prompt에 따라 각 subspace에서 attribute를 생성하는 factorized diffusion model을 도입논문 (ICML 2024) : Paper..

Paper/TTS 2025. 5. 4. 09:33

[Paper 리뷰] NaturalSpeech2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers

NaturalSpeech2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers기존의 large-scale text-to-speech system은 speech를 discrete token으로 quantize 하고 language model을 기반으로 해당 token을 처리함- 따라서 unstable prosody, word skipping/repeating 등의 문제가 발생함NaturalSpeech2Quantized latent vector를 얻기 위해 residual vector quantizer에 기반한 neural audio codec을 활용이후 diffusion model을 활용하여 text input..

Paper/TTS 2025. 5. 3. 09:37

[Paper 리뷰] ConSinger: Efficient High-Fidelity Singing Voice Generation with Minimal Steps

ConSinger: Efficient High-Fidelity Singing Voice Generation with Minimal StepsDiffusion model을 활용한 singing voice synthesis는 high-quality sample을 얻을 수 있지만 추론 속도의 한계가 있음ConSingerMimimal step 만으로 singing voice synthesis를 수행하기 위해 Consistency Model을 채택특히 training 중에 consistency constraint를 적용논문 (ICASSP 2025) : Paper Link1. IntroductionSinging Voice Synthesis (SVS)는 emotionally realistic human audio를 ..

Paper/SVS 2025. 5. 2. 17:18

[Paper 리뷰] Make-A-Voice: Revisiting Voice Large Language Models as Scalable Multilingual and Multitask Learners

Make-A-Voice: Revisiting Voice Large Language Models as Scalable Multilingual and Multitask LearnersVoice Large Language Model은 대부분 single task, monolingual로 제한됨Make-A-VoiceEnd-to-End local/global multiscale transformer를 활용하여 scalable learner를 구성Common knowledge를 share 하고 unseen task에 generalize 하여 in-context learning을 향상Low-resource language에 대한 data scarcity 문제를 해결하는 multilingual learner를 지원논문..

Paper/Language Model 2025. 5. 1. 09:52

[Paper 리뷰] ATP-TTS: Adaptive Thresholding Pseudo-Labeling for Low-Resource Multi-Speaker Text-to-Speech

ATP-TTS: Adaptive Thresholding Pseudo-Labeling for Low-Resource Multi-Speaker Text-to-SpeechText-to-Speech는 low-resource scenario에서는 활용하기 어려움ATP-TTSAdaptive Thresholding을 통해 적절한 pseudo-label을 select이후 contrastive learning perturbation으로 enhance 된 Automatic Speech Recognition model을 활용하여 latent representation을 predict논문 (ICASSP 2025) : Paper Link1. IntroductionGlow-TTS, VITS, NaturalSpeech와 같은 su..

Paper/TTS 2025. 4. 30. 17:50

[Paper 리뷰] SSR-Speech: Towards Stable, Safe and Robust Zero-Shot Text-based Speech Editing and Synthesis

SSR-Speech: Towards Stable, Safe and Robust Zero-Shot Text-based Speech Editing and SynthesisStable, safe, robust zero-shot text-to-speech model이 필요함SSR-SpeechTransformer decoder를 기반으로 classifier-free guidance를 incorporateWatermark EnCodec을 통해 edited region에 대한 frame-level watermark를 embed논문 (ICASSP 2025) : Paper Link1. IntroductionYourTTS와 같은 zero-shot text-based speech generation model은 Speech..

Paper/TTS 2025. 4. 29. 17:48

이전 1 ··· 9 10 11 12 13 14 15 ··· 70 다음

이전 다음

최근에 올라온 글

최근에 달린 댓글

« 2025/07 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Total

Today

Yesterday

Let IT Begin

티스토리툴바