'Paper/TTS' 카테고리의 글 목록

[Paper 리뷰] DetailTTS: Learning Residual Detail Information for Zero-Shot Text-to-Speech

DetailTTS: Learning Residual Detail Information for Zero-Shot Text-to-Speech기존 text-to-speech system은 linguistic, acoustic detail을 omission 하는 경우가 많음DetailTTSConditional Variational AutoEncoder를 기반으로 하는 zero-shot text-to-speech modelAlignment 과정에서 missed residual detail information을 capture 하는 Prior Detail module과 Duration Detail module을 도입논문 (ICASSP 2025) : Paper Link1. IntroductionZero-shot Te..

Paper/TTS 2025. 4. 9. 17:42

[Paper 리뷰] UMETTS: A Unified Framework for Emotional Text-to-Speech Synthesis with Multimodal Prompts

UMETTS: A Unified Framework for Emotional Text-to-Speech Synthesis with Multimodal PromptsEmotional Text-to-Speech (TTS)는 oversimplified emotional label이나 single-modality input에 의존하므로 human emotion을 효과적으로 반영하지 못함UMETTSEmotion Prompt Alignment module과 Emotion Embedding-Induced TTS module을 활용하여 multiple modality의 emotional cue를 반영Emotion Prompt Alignment module은 contrastive learning을 통해 text, audi..

Paper/TTS 2025. 4. 3. 19:51

[Paper 리뷰] VoiceGuider: Enhancing Out-of-Domain Performance in Parameter-Efficient Speaker-Adaptive Text-to-Speech via Autoguidance

VoiceGuider: Enhancing Out-of-Domain Performance in Parameter-Efficient Speaker-Adaptive Text-to-Speech via AutoguidanceSpeaker adaptive text-to-speech model에 paramter-efficient fine-tuning을 적용하는 경우, out-of-domain speaker에 대한 adaptation performance의 한계가 있음VoiceGuiderAutoguidance로 reinforce 된 speaker adaptive text-to-speech modelAutoguidance strengthening strategy를 통해 out-of-domain data에 대한 robus..

Paper/TTS 2025. 4. 2. 20:24

[Paper 리뷰] NanoVoice: Efficient Speaker-Adaptive Text-to-Speech for Multiple Speakers

NanoVoice: Efficient Speaker-Adaptive Text-to-Speech for Multiple SpeakersMultiple speaker에 대한 adapter를 활용하여 personalized text-to-speech model을 구성할 수 있음NanoVoiceMultiple reference를 parallel fine-tuning 할 수 있는 batch-wise speaker adaptation을 활용추가적으로 speaker adaptation parameter를 줄이기 위해 parameter sharing을 도입하고, trainable scale matrix를 incorporate논문 (ICASSP 2025) : Paper Link1. IntroductionVALL-E, V..

Paper/TTS 2025. 3. 26. 20:31

[Paper 리뷰] SlimSpeech: Lightweight and Efficient Text-to-Speech with Slim Rectified Flow

SlimSpeech: Lightweight and Efficient Text-to-Speech with Slim Rectified FlowFlow matching-based speech synthesis model은 inference step을 줄이면서 speech quality를 향상할 수 있음SlimSpeechRectified flow model을 기반으로 parameter 수를 줄이고 teacher model로 활용Reflow operation을 refine 하여 straight sampling trajectory를 가지는 smaller model을 directly derive 하고 distillation method를 통해 성능을 향상논문 (ICASSP 2025) : Paper Link1. Int..

Paper/TTS 2025. 3. 25. 20:49

[Paper 리뷰] Emo-DPO: Controllable Emotional Speech Synthesis through Direct Preference Optimization

Emo-DPO: Controllable Emotional Speech Synthesis through Direct Preference OptimizationEmotional Text-to-Speech는 주로 supervised training을 사용하여 text와 desired emotion을 emotional speech로 변환함- BUT, 단순히 correct emotional output만을 학습하므로 emotion 간의 nuance를 capture 하지 못함Emo-DPOPreferred emotion을 optimizing 하여 emotional nuance를 differentiate 하는 Direct Preference Optimization을 활용Emotion-aware Large Languag..

Paper/TTS 2025. 3. 17. 08:43

이전 1 2 3 4 ··· 22 다음

이전 다음

최근에 올라온 글

최근에 달린 댓글

« 2025/04 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Total

Today

Yesterday

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

Let IT Begin

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역