
SECodec: Structural Entropy-based Compressive Speech Representation Codec for Speech Language ModelsLarge Language Model을 위한 기존의 speech representation discretization method는 Euclidean distance-based quantization이나 pre-defined codebook에 의존함SECodecSpeech를 graph로 modeling 하고 graph 내의 speech feature node를 clustering 한 다음, 2D Strutural Entropy를 minimize 하여 codebook을 추출- 2D SE minimization principle을 ..

WavTokenizer: An Efficient Acoustic Discrete Codec Tokenizer for Audio Language ModelingLanguage model은 high-dimensional natural signal을 lower-dimensional discrete token으로 compress 하는 tokenizer를 활용함WavTokenizerQuantizer layer와 discrete codec의 temporal dimension을 compressBroader VQ space, contextual window extending, inverse Fourier transform structure를 통해 더 나은 reconstruction quality와 richer sema..

SpeechTokenizer: Unified Speech Tokenizer for Speech Language ModelsSpeech language model은 semantic, acoustic token과 같은 discrete speech representation을 기반으로 구축됨SpeechTokenizerSpeech token이 speech language model에 적합한지를 evaluate 하기 위해 SLMTokBench를 도입Residual Vector Quantization에 기반한 encoder-decoder architecture를 채택하여 unified speech tokenizer를 구성 논문 (ICLR 2024) : Paper Link1. IntroductionSpeech Lan..

From Discrete Tokens to High-Fidelity Audio Using Multi-Band DiffusionDiffusion을 highly compressed representation으로 condition 된 audio waveform을 합성하는 데 사용할 수 있음MBDLow-bitrate discrete representation에서 any type audio modality를 생성이를 위해 Multi-band diffusion-based framework를 활용논문 (NeurIPS 2023) : Paper Link1. IntroductionMelGAN과 같은 neural-based vocoder는 high-quality sample을 합성할 수 있음특히 HuBERT와 같은 Self..

FlowMAC: Conditional Flow Matching for Audio Coding at Low Bit RatesLow bit-rate에서 동작하는 high-quality general audio compression model이 필요함FlowMACConditional Flow Matching을 기반으로 scalable, memory-efficient training을 지원추론 시 ODE solver를 통해 continuous normalizing flow를 integrate 하여 high-quality mel-spectrogram을 생성논문 (ICASSP 2025) : Paper Link1. Introduction최근의 neural codec은 12 kbps 보다 낮은 bitrate에서 hig..

FlowDec: A Flow-Based Full-Band General Audio Codec with High Perceptual QualityLower bitrate에서도 동작하는 general full-band audio codec이 필요함FlowDecNon-adversarial codec training과 conditional flow matching에 기반한 stochastic postfilter를 활용Fine-tuning이나 distillation 없이 required postfilter evaluation을 절감논문 (ICLR 2025) : Paper Link1. IntroductionAudio codec은 audio waveform을 compact, quantized representatio..