TS3-Codec: Transformer-based Simple Streaming Single Codec대부분의 neural audio codec은 convolution을 기반으로 함TS3-CodecTransformer와 linear layer로만 구성된 simple streaming single codecConvolution layer를 fully eliminate 하여 simplicity와 expressiveness를 향상논문 (INTERSPEECH 2025) : Paper Link1. IntroductionNeural Audio Codec (NAC)는 audio signal을 discretized code로 compress 하는 것을 목표로 함특히 NAC는 AudioLM, VALL-E 등의 Spee..
Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model기존 audio codec은 audio compression을 위해 설계되어 있으므로 Large Language Model에서 최적의 성능을 발휘하기 어려움X-CodecResidual Vector Quantization 이전에 pre-trained semantic encoder를 incorporateResidual Vector Quantization 이후에는 semantic reconstruction loss를 적용논문 (AAAI 2025) : Paper Link1. IntroductionAudioLM, VALL-E와 같이 audio generatio..
DS-Codec: Dual-Stage Training with Mirror-to-NonMirror Architecture Switching for Speech CodecHigh-quality speech tokenizer가 필요함DS-CodecMirror-NonMirror architecture switching을 활용한 dual-stage training framework를 도입Mirrored architecture를 통해 learned codebook의 robustness를 향상하고 Mirror-NonMirror structure를 통해 training을 balance논문 (INTERSPEECH 2025) : Paper Link1. Introduction최근 VALL-E, AudioLM, AudioG..
LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech CodecDiscrete speech token은 high bitrate, redundant timbre information으로 인한 한계를 가짐LSCodecSpeaker perturbation을 활용한 multi-stage unsupervised training framework를 채택Continuous information bottleneck을 설정한 다음, discrete speaker-decoupled space를 생성하는 vector quantization을 수행하고, discrete token vocoder를 통해 acoustic detail을 refine논문 (INTERSPEECH 20..
DualCodec: A Low-Frame-Rate, Semantically-Enhanced Neural Audio Codec for Speech GenerationNeural audio codec은 frame rate와 audio quality 간의 trade-off를 가짐DualCodecSelf-Supervised Learning representation과 waveform representation을 integrateFirst-layer codec의 semantic information을 향상하고 low frame rate에서 동작논문 (INTERSPEECH 2025) : Paper Link1. IntroductionNeural audio codec은 audio signal을 discrete code..
UniCodec: Unified Audio Codec with Single Domain-Adaptive CodebookMulti-domain audio signal을 지원하는 audio codec이 필요함UniCodec각 audio domain의 distinct characterisitc을 capture 하기 위해 domain-adaptive codebook과 Mixture-of-Expert strategy를 활용Auxiliary module 없이 codec의 semantic density를 enrich 하기 위해 self-supervised mask prediction modeling approach를 적용논문 (ACL 2025) : Paper Link1. IntroductionSpeech Langua..
