'Paper/TTS' 카테고리의 글 목록 (15 Page)

[Paper 리뷰] TriniTTS: Pitch-Controllable End-to-End TTS without External Aligner

TriniTTS: Pitch-Controllable End-to-End TTS without External Aligner End-to-End architecture, prosody control, on-the-fly duration alignment를 모두 만족하는 text-to-speech 모델이 필요함 - 대부분 two-stage pipeline에 의존적이고 controllability가 부족하기 때문 TriniTTS External aligner 없이 pitch control이 가능한 end-to-end text-to-speech 모델 Alignment search, pitch estimation, waveform generation을 동시에 수행하여 음성의 data 분포를 나타내는 latent ..

Paper/TTS 2024. 3. 14. 10:27

[Paper 리뷰] AdaSpeech: Adaptive Text to Speech for Custom Voice

AdaSpeech: Adaptive Text to Speech for Custom Voice TTS adaptation에서 custom voice를 활용하기 위해서는 2가지 과제가 있음 - Adaptation 모델은 source speech data와 상당히 다른 다양한 acoustic condition을 처리할 수 있어야 함 - 음성 품질을 유지하면서 적은 memory 사용량을 가지도록 각 target speaker에 대한 adaptation parameter가 작아야 함 AdaSpeech 고품질 합성과 효율적인 voice customization을 지원하는 adaptive TTS 모델 다양한 acoustic condition을 처리하기 위해 utterance, phoneme level 모두에서 aco..

Paper/TTS 2024. 3. 12. 10:22

[Paper 리뷰] nnSpeech: Speaker-Guided Conditional Variational Autoencoder for Zero-Shot Multi-Speaker Text-to-Speech

nnSpeech: Speaker-Guided Conditional Variational Autoencoder for Zero-Shot Multi-Speaker Text-to-Speech Multi-speaker text-to-speech를 활용하기 위해서는 어려움이 많음 nnSpeech Fine-tuning 없이 하나의 adpatation utterance만을 사용하여 새로운 speaker voice를 합성할 수 있는 zero-shot multi-speaker 모델 Speaker-guided conditional vairational autoencoder를 활용하여 speaker, content information을 모두 포함하는 variable $Z$를 생성 Latent variable $Z$의 분포..

Paper/TTS 2024. 3. 8. 11:01

[Paper 리뷰] SC-GlowTTS: An Efficient Zero-Shot Multi-Speaker Text-to-Speech Model

SC-GlowTTS: An Efficient Zero-Shot Multi-Speaker Text-to-Speech Model Unseen speaker에 대한 similarity를 향상하는 zero-shot text-to-speech 모델이 필요함 SC-GlowTTS Flow-based decoder를 기반으로 speaker-conditional architecture를 도입 Text encoder로써 dilated residual convolutional-based encoder, gated convolutional-based encoder, transformer-based enocoder를 비교 추가적으로 text-to-speech 모델을 통해 예측된 spectrogram에 대해 GAN-based v..

Paper/TTS 2024. 3. 6. 09:22

[Paper 리뷰] PortaSpeech: Portable and High-Quality Generative Text-to-Speech

PortaSpeech: Portable and High-Quality Generative Text-to-Speech Non-autoregressive Text-to-Speech 모델은 고품질의 음성 합성이 가능하지만 몇 가지 한계가 있음 - VAE는 작은 모델 size로도 long-range semantic feature를 capture 할 수 있지만, 종종 부자연스러운 결과를 생성함 - Normalizing Flow는 frequency bin-wise detail을 reconstruct 하는데 좋지만, 많은 parameter 수를 필요로 함 PortaSpeech Lightweight architecture를 사용하여 고품질의 음성 합성을 지원하는 TTS 모델 Enhanced prior를 포함한 ligh..

Paper/TTS 2024. 3. 2. 12:13

[Paper 리뷰] Mixer-TTS: Non-autoregressive, Fast and Compact Text-to-Speech Model Conditioned on Language Model Embeddings

Mixer-TTS: Non-autoregressive, Fast and Compact Text-to-Speech Model Conditioned on Language Model Embeddings Mel-spectrogram generation에서는 non-autoregressive 모델이 유용함 Mixer-TTS MLP-Mixer architecture를 기반으로 pitch/duration predictor를 활용 Pre-trained language model의 token embedding을 추가적으로 도입하여 Mixer-TTS를 extend 논문 (ICASSP 2022) : Paper Link 1. Introduction Text-to-Speech (TTS)에서는 속도 향상을 위해서는 non-aut..

Paper/TTS 2024. 2. 26. 10:08

이전 1 ··· 12 13 14 15 16 17 18 ··· 20 다음

이전 다음

최근에 올라온 글

최근에 달린 댓글

« 2025/01 »
일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Total

Today

Yesterday

Let IT Begin

티스토리툴바