'Paper' 카테고리의 글 목록 (41 Page)

[Paper 리뷰] DiffVoice: Text-to-Speech with Latent Diffusion

DiffVoice: Text-to-Speech with Latent Diffusion Text-to-Speech 모델의 성능 향상을 위해 latent diffusion을 활용할 수 있음 DiffVoice Adversarial training을 활용한 variational autoencoder를 통해 speech signal을 phoneme-rate representation으로 encode Diffusion model을 통한 latent representation과 duration의 joint modelling 논문 (ICASSP 2023) : Paper Link 1. Introduction Diffusion model은 합성 작업에서 뛰어난 성능을 보이고 있음 Text-to-Speech (TTS)에서는..

Paper/TTS 2024. 1. 25. 13:41

[Paper 리뷰] DSPGAN: A GAN-based Universal Vocoder for High-Fidelity TTS by Time-Frequency Domain Supervision from DSP

DSPGAN: A GAN-based Universal Vocoder for High-Fidelity TTS by Time-Frequency Domain Supervision from DSP Generative Adversarial Network를 활용한 vocoder는 빠른 추론 속도와 효과적인 raw waveform 합성이 가능 하지만 unseen speaker에 대해서는 high-fidelity speech를 합성하기는 어려움 DSPGAN Digital Signal Processing에서의 time-frequency domain supervision을 도입하여 고품질 합성을 지원 Ground-truth와 예측 mel-spectrogram 사이의 mismatch를 해소하기 위해 DSP module에서 ..

Paper/Vocoder 2024. 1. 23. 17:05

[Paper 리뷰] MixPath: A Unified Approach for One-shot Neural Architecture Search

MixPath: A Unified Approach for One-shot Neural Architecture Search 일반적인 two-stage neural architecture search method는 single-path search space에 제한되어 있음 Multi-path structure를 효율적으로 search 하는 것은 여전히 어려움 MixPath Candidate architecture를 정확하게 평가하기 위해 one-shot multi-path supernet을 학습시킴 서로 다른 feature statistics를 regularize하기 위해 Shadow Batch Normalization을 도입 결과적으로 Shadow Batch Normalization을 통해 최적화를 안정..

Paper/NAS 2024. 1. 22. 15:21

[Paper 리뷰] Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech

Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech Denoising diffuion probabilistic model과 generative score matching은 복잡한 data 분포를 모델링하는데 뛰어남 Grad-TTS Encoder에 의해 예측된 noise를 점진적으로 변환하고 Monotonic Alignment Search를 통해 text input에 맞춰 정렬된 mel-spectrogram을 생성 Stochastic differential equation을 통해 noise로부터 data를 reconstruct 논문 (ICML 2021) : Paper Link 1. Introduction Text-to-Speech (TTS) 모델은 ..

Paper/TTS 2024. 1. 21. 14:31

[Paper 리뷰] Singing Voice Synthesis Using Differentiable LPC and Glottal-Flow-Inspired Wavetables

Singing Voice Synthesis Using Differentiable LPC and Glottal-Flow-Inspired Wavetables Singing Voice Synthesis를 위해 human voice의 physical characteristic을 활용할 수 있음 Glottal-Flow LPC Filter (GOLF) Harmonic source로써 glottal model을 사용하고, vocal tract를 simulate 하기 위해 IIR filter를 활용 GOLF는 더 적은 parameter와 memory를 사용함으로써 빠른 추론이 가능함 GOLF는 singing voice를 다양화할 수 있는 phase component를 modelling할 수 있음 논문 (ISMIR 20..

Paper/SVS 2024. 1. 20. 13:54

[Paper 리뷰] Fre-GAN 2: Fast and Efficient Frequency-Consistent Audio Synthesis

Fre-GAN 2: Fast and Efficient Frequency-Consistent Audio Synthesis 대규모의 TTS 모델은 resource가 제한된 device에 적용하기 어려우므로 neural vocoder는 효율적이면서도 고품질의 합성이 가능해야 함 Fre-GAN 2 Audio의 low/high-frequency에서 합성을 수행하고, inverse discrete wavelet transform을 통해 target-resolution audio를 reproduce 적은 수의 parameter 만으로 고품질의 audio를 합성할 수 있도록 adversarial periodic feature distillation을 도입 논문 (ICASSP 2022) : Paper Link 1. In..

Paper/Vocoder 2024. 1. 19. 13:54

이전 1 ··· 38 39 40 41 42 43 44 ··· 49 다음

이전 다음

최근에 올라온 글

최근에 달린 댓글

« 2025/02 »
일	월	화	수	목	금	토
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28

Total

Today

Yesterday

Let IT Begin

티스토리툴바