
Quad-Net: Melspectrogram Vocoder with Convolutional Layers Restricted by the Quadrature Mirror Filter for Perfect Reconstruction기존의 neural vocoder는 fixed signal processing filter에 의존하므로 hyperparameter flexibility가 부족함Quad-NetQuadrature mirror synthesis filter bank로 shape 된 restricted convolutional layer를 활용Perfect reconstruction filter bank에서 derive 된 perfect reconstruction loss를 통해 model을 optim..

Cauchy Diffusion: A Heavy-Tailed Denoising Diffusion Probabilistic Model for Speech Synthesis기존의 denoising diffusion probabilistic model-based vocoder는 prosody diversity를 반영하기 어려움Cauchy DiffusionHeavy-tailed Cauchy distribution을 통해 imbalanced speech data에 대한 better resilience를 달성결과적으로 diffusion vocoder의 prosody modeling을 향상논문 (AAAI 2025) : Paper Link1. Introduction최근 Denoising Diffusion Probabil..

WaveFM: A High-Fidelity and Efficient Vocoder based on Flow MatchingFlow Matching은 diffusion model에 대한 robust training을 제공하지만 neural vocoder에 directly applying 하면 audio quality가 저하됨WaveFMStandard Gaussian prior 대신 mel-conditioned prior distribution을 채택하여 transportation cost를 minimizeRefined multi-resolution STFT loss를 결합하여 audio quality를 향상추가적으로 inference speed 향상을 위해 consistency distillation me..

RFWave: Multi-Band Rectified Flow for Audio Waveform ReconstructionDiffusion model은 waveform reconstruction에 효과적이지만 상당한 sampling step이 필요하므로 latency 문제가 존재함RFWaveComplex spectrogram을 생성하고 frame-level에서 모든 subband를 simultaneously process 함Straight transport trajectory를 위해 Rectified Flow를 도입논문 (ICLR 2025) : Paper Link1. IntroductionAudio waveform reconstruction은 raw audio data에서 derive 된 low-dimen..

PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform GenerationHigh-resolution waveform signal의 natural periodic feature를 explicitly disentangle 할 수 있는 generator가 필요함PeriodWaveVector field를 추정할 때 waveform signal의 periodic feature를 capture 하는 period-aware flow matching estimator를 도입Waveform signal의 periodic feature를 capture 하는 multi-period estimator를 활용추가적으로 waveform generation에서 hig..

FA-GAN: Artifacts-Free and Phase-Aware High-Fidelity GAN-based VocoderGenerative Adversarial Network-based vocoder는 noticeable spectral artifact 문제가 존재함FA-GANNon-ideal upsampling layer로 인해 발생하는 aliasing artifact를 suppress 하기 위해 generator에 anti-aliased twin deconvolution module을 도입Blurring artifact를 완화하고 spectral detail reconstruction을 enrich 하기 위해 phase information modeling을 지원하는 fine-grained mu..