
From Discrete Tokens to High-Fidelity Audio Using Multi-Band DiffusionDiffusion을 highly compressed representation으로 condition 된 audio waveform을 합성하는 데 사용할 수 있음MBDLow-bitrate discrete representation에서 any type audio modality를 생성이를 위해 Multi-band diffusion-based framework를 활용논문 (NeurIPS 2023) : Paper Link1. IntroductionMelGAN과 같은 neural-based vocoder는 high-quality sample을 합성할 수 있음특히 HuBERT와 같은 Self..

FlowMAC: Conditional Flow Matching for Audio Coding at Low Bit RatesLow bit-rate에서 동작하는 high-quality general audio compression model이 필요함FlowMACConditional Flow Matching을 기반으로 scalable, memory-efficient training을 지원추론 시 ODE solver를 통해 continuous normalizing flow를 integrate 하여 high-quality mel-spectrogram을 생성논문 (ICASSP 2025) : Paper Link1. Introduction최근의 neural codec은 12 kbps 보다 낮은 bitrate에서 hig..

FlowDec: A Flow-Based Full-Band General Audio Codec with High Perceptual QualityLower bitrate에서도 동작하는 general full-band audio codec이 필요함FlowDecNon-adversarial codec training과 conditional flow matching에 기반한 stochastic postfilter를 활용Fine-tuning이나 distillation 없이 required postfilter evaluation을 절감논문 (ICLR 2025) : Paper Link1. IntroductionAudio codec은 audio waveform을 compact, quantized representatio..

FunCodec: A Fundamental, Reproducible and Integrable Open-Source Toolkit for Neural Speech CodecSoundStream, EnCodec과 같은 neural codec에 대한 open-source toolkit이 필요함FunCodecDownstream task에 easily integrate 될 수 있는 open-source codecLower computation, parameter complexity를 가지는 frequency-domain codec을 지원논문 (ICASSP 2024) : Paper Link1. IntroductionSpeech codec은 speech를 compact representation으로 encode 하..

ComplexDec: A Domain-Robust High-Fidelity Neural Audio Codec with Complex Spectrum Modeling기존의 neural audio codec은 out-of-domain audio를 modeling 하는데 어려움이 있음ComplexDecOut-of-Domain robustness는 codec compression으로 인한 information loss로 인해 발생24kbps bitrate에서 해당 information loss를 완화하기 위해 complex spectral input/output을 활용논문 (ICASSP 2025) : Paper Link1. IntroductionDigital Signal Processing (DSP)-based..

RepCodec: A Speech Representation Codec for Speech TokenizationDiscrete speech tokenization은 large language model에서 유용하게 활용되지만 discretization으로 인해 information loss가 발생함RepCodecSpeech encoder에서 speech representation을 reconstruction 하여 vector quantization codebook을 학습Speech encoder, Codec encoder, Vector quantization codebook으로 구성된 pipeline을 통해 speech waveform을 semantic token으로 변환논문 (ACL 2024) : P..