
FreeCodec: A Disentangled Neural Speech Codec with Fewer TokensNeural speech codec은 fewer token에 대해서는 성능 저하를 보임FreeCodecDistinct frame-level encoder를 사용하여 intrinsic speech property를 decompose서로 다른 frame-level information을 dedicated quantizer로 quantizing 하여 encoding efficiency를 향상논문 (INTERSPEECH 2025) : Paper Link1. IntroductionNeural Speech Codec은 distortion을 최소화하면서 제한된 bit 수로 speech signal을 com..

SPCodec: Split and Prediction for Neural Speech Codec기존 neural codec은 서로 다른 frequency band 간의 correlation을 fully exploit 하지 못함SPCodecLatent split-and-prediction scheme을 활용한 group residual vector quantization module을 도입Low-/high-frequency representation을 disentangle 하여 feature redundancy를 reduce논문 (INTERSPEECH 2025) : Paper Link1. IntroductionSpeech codec은 일반적으로 encoder, quantizer, decoder로 구성됨특히..

TS3-Codec: Transformer-based Simple Streaming Single Codec대부분의 neural audio codec은 convolution을 기반으로 함TS3-CodecTransformer와 linear layer로만 구성된 simple streaming single codecConvolution layer를 fully eliminate 하여 simplicity와 expressiveness를 향상논문 (INTERSPEECH 2025) : Paper Link1. IntroductionNeural Audio Codec (NAC)는 audio signal을 discretized code로 compress 하는 것을 목표로 함특히 NAC는 AudioLM, VALL-E 등의 Spee..

Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model기존 audio codec은 audio compression을 위해 설계되어 있으므로 Large Language Model에서 최적의 성능을 발휘하기 어려움X-CodecResidual Vector Quantization 이전에 pre-trained semantic encoder를 incorporateResidual Vector Quantization 이후에는 semantic reconstruction loss를 적용논문 (AAAI 2025) : Paper Link1. IntroductionAudioLM, VALL-E와 같이 audio generatio..

DS-Codec: Dual-Stage Training with Mirror-to-NonMirror Architecture Switching for Speech CodecHigh-quality speech tokenizer가 필요함DS-CodecMirror-NonMirror architecture switching을 활용한 dual-stage training framework를 도입Mirrored architecture를 통해 learned codebook의 robustness를 향상하고 Mirror-NonMirror structure를 통해 training을 balance논문 (INTERSPEECH 2025) : Paper Link1. Introduction최근 VALL-E, AudioLM, AudioG..

LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech CodecDiscrete speech token은 high bitrate, redundant timbre information으로 인한 한계를 가짐LSCodecSpeaker perturbation을 활용한 multi-stage unsupervised training framework를 채택Continuous information bottleneck을 설정한 다음, discrete speaker-decoupled space를 생성하는 vector quantization을 수행하고, discrete token vocoder를 통해 acoustic detail을 refine논문 (INTERSPEECH 20..