'Paper/TTS' 카테고리의 글 목록 (12 Page)

[Paper 리뷰] Mels-TTS: Multi-Emotion Multi-Lingual Multi-Speaker Text-to-Speech System via Disentangled Style Tokens

Mels-TTS: Multi-Emotion Multi-Lingual Multi-Speaker Text-to-Speech System via Disentangled Style Tokens효과적인 emotion transfer를 위해 disentangled style token을 활용할 수 있음Mels-TTSGlobal style token에서 영감을 받아 emotion, language, speaker, residual information을 disentangle 하는 개별적인 style token을 활용Attention mechanism을 적용하여 각 style token에서 target speech에 대한 speech attribute를 학습논문 (ICASSP 2024) : ..

Paper/TTS 2024. 4. 24. 10:21

[Paper 리뷰] MM-TTS: Multi-Modal Prompt Based Style Transfer for Expressive Text-to-Speech Synthesis

MM-TTS: Multi-Modal Prompt Based Style Transfer for Expressive Text-to-Speech SynthesisText-to-Speech에서 style transfer는 style information을 text context에 반영하여 특정 style을 가진 음성을 생성하는 것을 목표로 함BUT, 기존의 style transfer 방식들은 fixed emotional label이나 reference clip에 의존하므로 flexible 한 style transfer의 한계가 있음MM-TTS생성되는 음성의 style을 control 하기 위해 reference speech, emotional facial image, text description 등을 포함하는..

Paper/TTS 2024. 4. 23. 09:45

[Paper 리뷰] DCTTS: Discrete Diffusion Model with Contrastive Learning for Text-to-Speech Generation

DCTTS: Discrete Diffusion Model with Contrastive Learning for Text-to-Speech Generation Text-to-Speech에서 latent diffusion model을 우수한 성능을 보이고 있지만, resource consumption이 크고 추론 속도가 느림 DCTTS Discrete diffusion model과 contrastive learning을 결합한 text-to-speech 모델 간단한 text encoder와 VQ model을 사용하여 raw data를 discrete space로 compress 한 다음, discrete space에서 diffusion model을 training 함 이때 diffusion step 수를 줄..

Paper/TTS 2024. 4. 19. 09:29

[Paper 리뷰] VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching

VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching Text-to-Speech에서 diffusion model은 우수한 성능을 보이고 있지만 sampling complexity로 인해 비효율적임 VoiceFlow 제한된 sampling step으로도 고품질의 합성을 수행할 수 있는 rectified flow matching을 활용 Text input을 condition으로 하여 mel-spectrogram을 ordinary differential equation을 통해 추정 Rectified flow는 효율적인 합성을 위해 sampling trajectory를 straighten 함 논문 (ICASSP 2024) : Paper Link 1...

Paper/TTS 2024. 4. 18. 11:10

[Paper 리뷰] InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt

InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt Expressive Text-to-Speech는 다양한 speech pattern을 반영하는 것을 목표로 하고, 이때 style을 control 하는 style prompt로 natural language를 활용할 수 있음 InstructTTS Self-supervised learning과 cross-modal metric learning을 활용하고 robust sentence embedding model을 얻기 위해 3-stage training을 제시 일반적인 mel-spectrogram 대신 vector-quantized ac..

Paper/TTS 2024. 4. 13. 14:21

[Paper 리뷰] PromptStyle: Controllable Style Transfer for Text-to-Speech with Natural Language Descriptions

PromptStyle: Controllable Style Transfer for Text-to-Speech with Natural Language Descriptions Text-to-Speech에서 style control을 위해서는 개별적인 style category가 있는 expressive speech recording이 필요함 BUT, 실적용에서는 target style에 대한 referecne speech 없이 desired style에 대한 text description을 활용하는 것이 더 적합하다고 볼 수 있음 PromptStyle Text prompt-guided cross-speaker style transfer를 목표로 VITS와 cross-modal style encoder를 활용 ..

Paper/TTS 2024. 4. 12. 10:32

이전 1 ··· 9 10 11 12 13 14 15 ··· 20 다음

이전 다음

최근에 올라온 글

최근에 달린 댓글

« 2025/01 »
일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Total

Today

Yesterday

Let IT Begin

티스토리툴바