'TTS' 태그의 글 목록 (22 Page)

[Paper 리뷰] CyFi-TTS: Cyclic Normalizing Flow with Fine-Grained Representation for End-to-End Text-to-Speech

CyFi-TTS: Cyclic Normalizing Flow with Fine-Grained Representation for End-to-End Text-to-Speech End-to-End Text-to-Speech는 unseen data에 대해 적용하는 것은 어려움 One-to-many 문제로 인해 text와 음성 사이에 information gap이 발생하여 mispronunciation 되기 쉽기 때문 CyFi-TTS Cyclic normalizing flow를 도입하여 information gap을 해소해 자연스러운 음성을 합성 Temporal multi-resolution upsampler를 도입하여 fine-grained representation을 점진적으로 생성 논문 (ICASSP 20..

Paper/TTS 2024. 1. 18. 18:19

[Paper 리뷰] SpeedySpeech: Efficient Neural Speech Synthesis

SpeedySpeech: Efficient Neural Speech Syntheis Neural Text-to-Speech는 음성 합성의 품질을 크게 향상했지만, 여전히 추론 및 학습 속도가 느림 SpeedySpeech 계산 resource 요구사항이 적고, 빠른 spectrogram 합성이 가능한 student-teacher network 고품질 audio 생성에 self-attention layer가 필요하지 않다는 점을 이용 Residual connection이 있는 간단한 convolution을 활용하고 teacher model에 대해서만 attention layer를 적용 논문 (INTERSPEECH 2020) : Paper Link 1. Introduction 최신 Neural Text-to-..

Paper/TTS 2024. 1. 17. 12:33

[Paper 리뷰] Personalized Lightweight Text-to-Speech: Voice Cloning with Adaptive Structured Pruning

Personalized Lightweight Text-to-Speech: Voice Cloning with Adpative Structured PruningPersonalized Text-to-Speech를 위해서는 많은 양의 recording과 큰 규모의 모델을 필요로 하므로 mobile device 배포에 적합하지 않음이를 해결하기 위해 일반적으로 pre-train 된 Text-to-Speech 모델을 fine-tuning 하는 voice cloning을 활용함- 여전히 pre-train된 대규모 모델에 기반을 두고 있어 한계가 있음Adaptive Structured PruningTrainable structured pruning을 voice cloning에 적용Voice-cloning data로 s..

Paper/TTS 2024. 1. 10. 13:22

[Paper 리뷰] LiteTTS: A Lightweight Mel-spectrogram-free Text-to-wave Synthesizer Based on Generative Adversarial Networks

LiteTTS: A Lightweight Mel-spectrogram-free Text-to-wave Synthesizer Based on Generative Adversarial Networks 빠른 속도로 고품질의 음성을 합성할 수 있는 lightweight end-to-end text-to-speech 모델이 필요 LiteTTS Feature prediction module과 waveform generation module을 결합한 single framework Feature prediction module은 input text 및 prosodic information에 대한 latent space embedding을 추정 Waveform generation module은 추정된 latent emb..

Paper/TTS 2024. 1. 8. 16:46

[Paper 리뷰] UniSyn: An End-to-End Unified Model for Text-to-Speech and Singing Voice Synthesis

UniSyn: And End-to-End Unified Model for Text-to-Speech and Sining Voice Synthesis Text-to-Speech와 Singing Voice Synthesis를 단일 시스템으로 통합하는 기존의 방법들은, 동일한 화자로 제한되거나 cascaded model에 의존하는 한계가 있음 UniSyn 음성 합성과 가창 합성을 통합한 end-to-end 모델 Speaker와 style을 condition으로 사용하는 Multi-Conditional Variational AutoEncoder 구조 Timbre와 style의 disentangle을 위한 supervised guided-VAE와 Wasserstein distance 기반 timbre pertur..

Paper/SVS 2024. 1. 4. 09:58

[Paper 리뷰] FastPitch: Parallel Text-to-Speech with Pitch Prediction

FastPitch: Parallel Text-to-Speech with Pitch PredictionPitch contour를 예측하면 utterance의 semantic을 일치시키고 풍부한 음성 표현력을 얻을 수 있음FastPitchFastSpeech 기반의 fully-parallel text-to-speech 모델 Pitch 조절을 통한 자연스러운 음성 변조와 frequency contour를 condition으로 한 합성 품질의 향상논문 (ICASSP 2021) : Paper Link1. IntroductionNeural Text-to-Speech (TTS)는 합성 품질 향상을 위해 다양한 방법들을 꾸준히 제시하고 있음TTS 모델은 linguistic feature나 fundamental frequ..

Paper/TTS 2023. 12. 14. 10:44

이전 1 ··· 19 20 21 22 23 24 다음

이전 다음

최근에 올라온 글

최근에 달린 댓글

« 2025/05 »
일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Total

Today

Yesterday

Let IT Begin

티스토리툴바