'Paper/TTS' 카테고리의 글 목록 (16 Page)

[Paper 리뷰] StyleTTS2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models

StyleTTS2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language ModelsHuman-level text-to-speech를 위해 large speech language model (SLM)을 활용할 수 있음StyleTTS2Diffusion model을 통해 style을 latent random variable로 모델링하여 reference speech 없이 text에 적합한 style을 생성End-to-End training을 위해 differentiable duration modeling이 가능한 discriminator를 도입하고 large pre..

Paper/TTS 2024. 3. 17. 13:45

[Paper 리뷰] P-Flow: A Fast and Data-Efficient Zero-Shot TTS through Speech Prompting

P-Flow: A Fast and Data-Efficient Zero-Shot TTS through Speech PromptingNeural codec language model은 대규모의 data를 학습하여 zero-shot text-to-speech 성능을 크게 향상함- BUT, robustness가 부족하고, sampling 속도가 매우 느리고, pre-trained neural codec representation에 의존적임P-FlowSpeaker adaptation을 위해 speech prompt를 사용하는 빠르고 data-efficient 한 zero-shot text-to-speech 모델Speech-prompted text encoder와 flow matching generative dec..

Paper/TTS 2024. 3. 16. 13:00

[Paper 리뷰] TriniTTS: Pitch-Controllable End-to-End TTS without External Aligner

TriniTTS: Pitch-Controllable End-to-End TTS without External AlignerEnd-to-End architecture, prosody control, on-the-fly duration alignment를 모두 만족하는 text-to-speech 모델이 필요함- 대부분 two-stage pipeline에 의존적이고 controllability가 부족하기 때문TriniTTSExternal aligner 없이 pitch control이 가능한 end-to-end text-to-speech 모델Alignment search, pitch estimation, waveform generation을 동시에 수행하여 음성의 data 분포를 나타내는 latent vecto..

Paper/TTS 2024. 3. 14. 10:27

[Paper 리뷰] AdaSpeech: Adaptive Text to Speech for Custom Voice

AdaSpeech: Adaptive Text to Speech for Custom Voice TTS adaptation에서 custom voice를 활용하기 위해서는 2가지 과제가 있음 - Adaptation 모델은 source speech data와 상당히 다른 다양한 acoustic condition을 처리할 수 있어야 함 - 음성 품질을 유지하면서 적은 memory 사용량을 가지도록 각 target speaker에 대한 adaptation parameter가 작아야 함 AdaSpeech 고품질 합성과 효율적인 voice customization을 지원하는 adaptive TTS 모델 다양한 acoustic condition을 처리하기 위해 utterance, phoneme level 모두에서 aco..

Paper/TTS 2024. 3. 12. 10:22

[Paper 리뷰] nnSpeech: Speaker-Guided Conditional Variational Autoencoder for Zero-Shot Multi-Speaker Text-to-Speech

nnSpeech: Speaker-Guided Conditional Variational Autoencoder for Zero-Shot Multi-Speaker Text-to-Speech Multi-speaker text-to-speech를 활용하기 위해서는 어려움이 많음 nnSpeech Fine-tuning 없이 하나의 adpatation utterance만을 사용하여 새로운 speaker voice를 합성할 수 있는 zero-shot multi-speaker 모델 Speaker-guided conditional vairational autoencoder를 활용하여 speaker, content information을 모두 포함하는 variable $Z$를 생성 Latent variable $Z$의 분포..

Paper/TTS 2024. 3. 8. 11:01

[Paper 리뷰] SC-GlowTTS: An Efficient Zero-Shot Multi-Speaker Text-to-Speech Model

SC-GlowTTS: An Efficient Zero-Shot Multi-Speaker Text-to-Speech Model Unseen speaker에 대한 similarity를 향상하는 zero-shot text-to-speech 모델이 필요함 SC-GlowTTS Flow-based decoder를 기반으로 speaker-conditional architecture를 도입 Text encoder로써 dilated residual convolutional-based encoder, gated convolutional-based encoder, transformer-based enocoder를 비교 추가적으로 text-to-speech 모델을 통해 예측된 spectrogram에 대해 GAN-based v..

Paper/TTS 2024. 3. 6. 09:22

이전 1 ··· 13 14 15 16 17 18 19 ··· 22 다음

이전 다음

최근에 올라온 글

최근에 달린 댓글

« 2025/04 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Total

Today

Yesterday

Let IT Begin

티스토리툴바