'Vocoder' 태그의 글 목록 (6 Page)

[Paper 리뷰] VocGAN: A High-Fidelity Real-Time Vocoder with Hierarchically-nested Adversarial Network

VocGAN: A High-Fidelity Real-Time Vocoder with a Hierarchically-nested Adversarial NetworkGAN-based vocoder는 real-time 합성이 가능하지만 input mel-spectrogram의 acoustic characteristic과 incosistent 한 waveform을 생성하는 경우가 많음VocGANGAN-based vocoder의 합성 속도를 유지하면서 output waveform의 품질과 consistency를 개선Multi-scale waveform generator와 hierarchically-nested discriminator를 활용해 multiple level의 acoustic property를 학습Jo..

Paper/Vocoder 2024. 5. 6. 10:27

[Paper 리뷰] StyleMelGAN: An Efficient High-Fidelity Adversarial Vocoder with Temporal Adaptive Normalization

StyleMelGAN: An Efficient High-Fidelity Adversarial Vocoder with Temporal Adaptive NormalizationLightweight neural vocoder는 여전히 perceptual quailty 측면에서 열등한 성능을 보임StyleMelGAN낮은 complexity를 가지면서 high-fidelity의 음성을 합성할 수 있는 lightweight neural vocoderTemporal Adaptive Normalization을 사용하여 target speech의 acoustic feature로 low-dimensional noise vector를 style 함Random Window Discriminator는 multi-scale sp..

Paper/Vocoder 2024. 5. 1. 10:21

[Paper 리뷰] Framewise WaveGAN: High Speed Adversarial Vocoder in Time Domain with Very Low Computational Complexity

Framewise WaveGAN: High Speed Adversarial Vocoder in Time Domain with Very Low Computational ComplexityGAN-based vocoder는 고품질 waveform을 합성하는데 자주 사용됨BUT, 대부분의 architecture는 sample-wise로 waveform을 생성하므로 상당한 GFLOPS가 필요함- 결과적으로 accelerator나 parallel computer 없이 일반적인 CPU에서 사용하기 어려움Framewise WaveGANFramewise로 time domain signal을 생성하기 위해 recurrent, fully-connected network를 활용하는 GAN-based vocoder결과적으로 c..

Paper/Vocoder 2024. 4. 29. 10:14

[Paper 리뷰] FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis

FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech SynthesisDenoising Diffusion Probabilistic Model은 우수한 합성 성능을 보이지만, iterative sampling process로 인해 속도의 한계가 있음FastDiff고품질의 음성 합성을 위한 fast conditional diffusion model다양한 receptive field pattern의 time-aware location-variable convolution stack을 사용하여 adaptive condition으로 long-term dependency를 모델링품질을 유지하면서 sampling step을 줄이기 위해 noise ..

Paper/Vocoder 2024. 4. 27. 10:41

[Paper 리뷰] LangWave: Realistic Voice Generation based on High-Order Langevin Dynamics

LangWave: Realistic Voice Generation based on High-Order Langevin DynamicsDiffusion model은 음성 생성에서 우수한 성능을 보이고 있지만 대부분 first-order stochastic differential equation이나 equivalent diffusion model에 의존함LangWave기존의 first-order method에서 벗어나 third-order Langevin dynamical system을 활용하여 waveform을 생성Ambient Euclidean space에서 voice wave diffusion, position, velocity, acceleration을 동시에 모델링하여 white noise에서 wa..

Paper/Vocoder 2024. 4. 22. 10:51

[Paper 리뷰] Universal MelGAN: A Robust Neural Vocoder for High-Fidelity Waveform Generation in Multiple Domains

Universal MelGAN: A Robust Neural Vocoder for High-Fidelity Waveform Generation in Multiple Domains 여러 domain에서 high-fidelity의 음성을 합성할 수 있는 vocoder가 필요함 Universal MelGAN MelGAN-based structure에 multi-resolution spectrogram discriminator를 추가하여 생성된 waveform의 spectral resolution을 향상 이를 통해 large footprint 모델의 high-frequency band에서의 over-smoothing 문제를 방지 논문 (ICASSP 2021) : Paper Link 1. Introduction ..

Paper/Vocoder 2024. 4. 17. 10:05

이전 1 ··· 3 4 5 6 7 8 9 ··· 12 다음

이전 다음

최근에 올라온 글

최근에 달린 댓글

« 2025/11 »
일	월	화	수	목	금	토
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30

Total

Today

Yesterday

Let IT Begin

티스토리툴바