
BridgeVoC: Neural Vocoder with Shrodinger BridgeDiffusion-based neural vocoder는 mel-spectrogram의 linear-degradation을 neglect 함BridgeVoCTime-Frequency domain-based neural vocoder와 Schrodinger Bridge를 연결Mel-spectrogram을 target linear-scale domain으로 project 하고 degraded spectral representation으로 취급논문 (IJCAI 2025) : Paper Link1. IntroductionNeural vocoder는 acoustic feature로부터 high-quality waveform을 생..

RNDVoC: Learning Neural Vocoder from Range-Null Space DecompositionNeural vocoder는 parameter-performance trade-off가 존재함RNDVoCRange-Null Decomposition과 vocoder task를 bridge 하여 target spectrogram reconstruction을 range-space와 null-space 간의 superimposition으로 decompose추가적으로 sub-band, sequential modeling을 위해 cross-/narrow-band module을 활용한 dual-path framework를 구성논문 (IJCAI 2025) : Paper Link1. Introduct..

AF-Vocoder: Artifact-Free Neural Vocoder with Global Artifact FilterGenerative Adversarial Network 기반의 vocoder는 audible artifact로 인한 합성 품질의 한계가 있음AF-VocoderArtifact removal을 위해 frequency-domain artifact filter인 GAFilter를 도입GAFilter는 frequency control을 위해 desired inductive bias를 enforce 함논문 (INTERSPEECH 2025) : Paper Link1. IntroductionVocoder는 acoustic feature를 speech waveform으로 변환하는 것을 목표로 함특히 ..

Quad-Net: Melspectrogram Vocoder with Convolutional Layers Restricted by the Quadrature Mirror Filter for Perfect Reconstruction기존의 neural vocoder는 fixed signal processing filter에 의존하므로 hyperparameter flexibility가 부족함Quad-NetQuadrature mirror synthesis filter bank로 shape 된 restricted convolutional layer를 활용Perfect reconstruction filter bank에서 derive 된 perfect reconstruction loss를 통해 model을 optim..

Cauchy Diffusion: A Heavy-Tailed Denoising Diffusion Probabilistic Model for Speech Synthesis기존의 denoising diffusion probabilistic model-based vocoder는 prosody diversity를 반영하기 어려움Cauchy DiffusionHeavy-tailed Cauchy distribution을 통해 imbalanced speech data에 대한 better resilience를 달성결과적으로 diffusion vocoder의 prosody modeling을 향상논문 (AAAI 2025) : Paper Link1. Introduction최근 Denoising Diffusion Probabil..

WaveFM: A High-Fidelity and Efficient Vocoder based on Flow MatchingFlow Matching은 diffusion model에 대한 robust training을 제공하지만 neural vocoder에 directly applying 하면 audio quality가 저하됨WaveFMStandard Gaussian prior 대신 mel-conditioned prior distribution을 채택하여 transportation cost를 minimizeRefined multi-resolution STFT loss를 결합하여 audio quality를 향상추가적으로 inference speed 향상을 위해 consistency distillation me..