'Paper/TTS' 카테고리의 글 목록

[Paper 리뷰] RapFlow-TTS: Rapid and High-Fidelity Text-to-Speech with Improved Consistency Flow Matching

RapFlow-TTS: Rapid and High-Fidelity Text-to-Speech with Improved Consistency Flow MatchingOrdinary Differential Equation 기반의 Text-to-Speech는 quality와 inference speed 간의 trade-off가 존재함RapFlow-TTSConsistenct quality를 위해 Flow Matching-Straightened Ordinary Differential Equation trajectory를 따라 velocity field의 consistency를 enforceFew-step synthesis의 quality를 향상하기 위해 time interval scheduling, adversa..

Paper/TTS 2025. 7. 15. 17:01

[Paper 리뷰] MPE-TTS: Customized Emotion Zero-Shot Text-to-Speech Using Multi-Modal Prompt

MPE-TTS: Customized Emotion Zero-Shot Text-to-Speech Using Multi-Modal PromptMulti-modal prompt를 zero-shot Text-to-Speech에 활용할 수 있음MPE-TTS다양한 prompt에서 emotion information을 추출하기 위해 Multi-Modal Prompt Emotion Encoder를 도입추가적으로 prosody predictor와 emotion consistency loss를 적용논문 (INTERSPEECH 2025) : Paper Link1. IntroductionZero-Shot Text-to-Speech (ZS-TTS)는 unseen style의 speech를 생성하는 것을 목표로 함Speech-b..

Paper/TTS 2025. 7. 10. 17:02

[Paper 리뷰] Spotlight-TTS: Spotlighting the Style via Voiced-Aware Style Extraction and Style Direction Adjustment for Expressive Text-to-Speech

Spotlight-TTS: Spotlighting the Style via Voiced-Aware Style Extraction and Style Direction Adjustment for Expressive Text-to-SpeechExpressive Text-to-Speech는 여전히 한계가 있음Spotlight-TTS서로 다른 speech region의 continuity를 maintain 하는 Voiced-Aware Style Extraction을 도입추가적으로 추출된 style의 direction을 adjust 하여 speech quality를 향상논문 (INTERSPEECH 2025) : Paper Link1. IntroductionText-to-Speech (TTS)는 input text에..

Paper/TTS 2025. 7. 8. 17:00

[Paper 리뷰] DiEmo-TTS: Disentangled Emotion Representations via Self-Supervised Distillation for Cross-Speaker Emotion Transfer in Text-to-Speech

DiEmo-TTS: Disentangled Emotion Representations via Self-Supervised Distillation for Cross-Speaker Emotion Transfer in Text-to-Speech기존의 emotional Text-to-Speech model은 speaker, emotion characteristic을 fully separate 하지 못함DiEmo-TTSEmotional attribute prediction과 speaker embedding을 사용한 emotion clustering을 도입Style feature를 integrate하는 dual conditioning Transformer를 활용논문 (INTERSPEECH 2025) : Paper ..

Paper/TTS 2025. 7. 6. 07:38

[Paper 리뷰] EmoMix: Emotion Mixing via Diffusion Models for Emotional Speech Synthesis

EmoMix: Emotion Mixing via Diffusion Models for Emotional Speech SynthesisEmotional Text-to-Speech는 여전히 intensity control 측면에서 한계가 있음EmoMixEmotion embedding을 추출하기 위해 pre-trained Speech Emotion Recognition model을 활용Run-time 시 diffusion model을 기반으로 mixed emotion synthesis를 수행논문 (INTERSPEECH 2023) : Paper Link1. IntroductionGenerSpeech와 같은 emotional Text-to-Speech (TTS) model은 reference-based style..

Paper/TTS 2025. 7. 4. 17:00

[Paper 리뷰] OZSpeech: One-Step Zero-Shot Speech Synthesis with Learned-Prior-Conditioned Flow Matching

OZSpeech: One-Step Zero-Shot Speech Synthesis with Learned-Prior-Conditioned Flow MatchingWaveform, spectrogram과 같은 기존의 speech representation은 speech attribute를 overlooking 하고 high computational cost를 가짐OZSpeechOne-step sampling과 learned prior를 condition으로 사용하여 sampling step 수를 reduceToken format의 disentangled, factorized component를 활용하여 speech attributre를 modeling논문 (ACL 2025) : Paper Link1. In..

Paper/TTS 2025. 6. 30. 17:03

이전 1 2 3 4 ··· 27 다음

이전 다음

최근에 올라온 글

최근에 달린 댓글

« 2025/07 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Total

Today

Yesterday

Let IT Begin

티스토리툴바