'Paper' 카테고리의 글 목록 (36 Page)

[Paper 리뷰] Lightweight Zero-Shot Text-to-Speech with Mixture of Adapters

Lightweight Zero-Shot Text-to-Speech with Mixture of AdaptersLarge-scale model을 기반으로 한 zero-shot text-to-speech는 speaker characteristic reproducing에서 우수한 성능을 보이고 있지만, 실제로 활용하기에는 너무 큼Zero-Shot TTS with MoAMixture of Adapters (MoA) module을 non-autoregressive TTS 모델의 decoder와 variance adaptor에 결합Speaker embedding을 기반으로 speaker characteristics와 관련된 적절한 adapter를 선택하여 adatation ability를 향상논문 (INTERSPE..

Paper/TTS 2024. 7. 9. 09:17

[Paper 리뷰] FLY-TTS: Fast, Lightweight and High-Quality End-to-End Text-to-Speech Synthesis

FLY-TTS: Fast, Lightweight and High-Quality End-to-End Text-to-Speech SynthesisFast, Lightweight Text-to-Speech 모델에 대한 요구사항이 커지고 있음FLY-TTSDecoder를 Fourier spectral coefficient를 생성하는 ConvNeXt block으로 대체하고, inverse STFT를 적용하여 waveform을 합성Model size를 compress 하기 위해 text encoder와 flow-based model에 grouped parameter-sharing을 도입추가적으로 합성 품질 향상을 위해 large pre-trained WavLM을 통해 adversarial training 함논문 (I..

Paper/TTS 2024. 7. 8. 09:37

[Paper 리뷰] DFlow: A Generative Model Combining Denoising AutoEncoder and Normalizing Flow for High Fidelity Waveform Generation

DFlow: A Generative Model Combining Denoising AutoEncoder and Normalizing Flow for High Fidelity Waveform GenerationHigh-fidelity의 waveform generation을 위한 vocoder가 필요함DFlow고품질 생성을 위해 Normalizing Flow와 Denoising AutoEncoder를 결합추가적으로 model size와 training set을 확장하여 DFlow를 large-scale universal vocoder로 scaling up논문 (ICML 2024) : Paper Link1. IntroductionDeep Generative Model (DGM)은 waveform generat..

Paper/Vocoder 2024. 7. 7. 13:27

[Paper 리뷰] Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale

Voicebox: Text-Guided Multilingual Universal Speech Generation at ScaleLarge-scale generative model은 고품질의 output을 생성할 수 있지만, scale과 task generalization 측면에서 한계가 있음Voicebox주어진 audio context와 text를 기반으로 speech를 infill 하도록 train 된 non-autoregressive flow-matching modelIn-context learning을 통해 cross-lingual zero-shot synthesis, noise removal, content editing, style conversion 등의 다양한 task를 지원논문 (NeurI..

Paper/Language Model 2024. 7. 6. 13:20

[Paper 리뷰] MobileSpeech: A Fast and High-Fidelity Framework for Mobile Zero-Shot Text-to-Speech

MobileSpeech: A Fast and High-Fidelity Framework for Mobile Zero-Shot Text-to-SpeechZero-shot Text-to-Speech는 few-second unseen speaker voice prompt로 강력한 voice cloning capability를 달성할 수 있음BUT, 대부분의 기존 방식들은 우수한 합성 품질에 비해 추론 속도, model size 측면의 한계가 있음MobileSpeechDiscrete codec를 기반으로 speech codec의 hierarchical information과 weight mechanism을 incorporate 하는 Speech Mask Decoder module을 도입- 특히 text와 spe..

Paper/TTS 2024. 7. 5. 11:22

[Paper 리뷰] DRSpeech: Degradation-Robust Text-to-Speech Synthesis with Frame-Level and Utterance-Level Acoustic Representation Learning

DRSpeech: Degradation-Robust Text-to-Speech Synthesis with Frame-Level and Utterance-Level Acoustic Representation Learning대부분의 text-to-speech system은 well-designed 환경에서 수집된 고품질 corpus를 활용하므로 데이터 수집 비용이 높음DRSpeechNoisy speech corpora를 training data로 활용할 수 있는 noise-robust text-to-speech 모델Frame-level encoder를 통해 time-variant additive noise를 represent 하고 utterance-level encoder를 사용하여 time-invarian..

Paper/TTS 2024. 7. 4. 09:09

이전 1 ··· 33 34 35 36 37 38 39 ··· 70 다음

이전 다음

최근에 올라온 글

최근에 달린 댓글

« 2025/07 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Total

Today

Yesterday

Let IT Begin

티스토리툴바