
LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech CodecDiscrete speech token은 high bitrate, redundant timbre information으로 인한 한계를 가짐LSCodecSpeaker perturbation을 활용한 multi-stage unsupervised training framework를 채택Continuous information bottleneck을 설정한 다음, discrete speaker-decoupled space를 생성하는 vector quantization을 수행하고, discrete token vocoder를 통해 acoustic detail을 refine논문 (INTERSPEECH 20..

DualCodec: A Low-Frame-Rate, Semantically-Enhanced Neural Audio Codec for Speech GenerationNeural audio codec은 frame rate와 audio quality 간의 trade-off를 가짐DualCodecSelf-Supervised Learning representation과 waveform representation을 integrateFirst-layer codec의 semantic information을 향상하고 low frame rate에서 동작논문 (INTERSPEECH 2025) : Paper Link1. IntroductionNeural audio codec은 audio signal을 discrete code..

UniCodec: Unified Audio Codec with Single Domain-Adaptive CodebookMulti-domain audio signal을 지원하는 audio codec이 필요함UniCodec각 audio domain의 distinct characterisitc을 capture 하기 위해 domain-adaptive codebook과 Mixture-of-Expert strategy를 활용Auxiliary module 없이 codec의 semantic density를 enrich 하기 위해 self-supervised mask prediction modeling approach를 적용논문 (ACL 2025) : Paper Link1. IntroductionSpeech Langua..

ALMTokenizer: A Low-Bitrate and Semantic-Rich Audio Codec Tokenizer for Audio Language ModelingAudio token을 audio language model에서 중요하게 사용됨ALMTokenizerFrame 간의 context information을 explicitly modeling 하여 learnable query token set을 통해 holistic information을 capture 하는 Query-based Compression Strategy를 도입Semantic information을 향상하기 위해 Masked AutoEncoder, Semantic prior-based Vector Quantization, Aut..

SECodec: Structural Entropy-based Compressive Speech Representation Codec for Speech Language ModelsLarge Language Model을 위한 기존의 speech representation discretization method는 Euclidean distance-based quantization이나 pre-defined codebook에 의존함SECodecSpeech를 graph로 modeling 하고 graph 내의 speech feature node를 clustering 한 다음, 2D Strutural Entropy를 minimize 하여 codebook을 추출- 2D SE minimization principle을 ..

WavTokenizer: An Efficient Acoustic Discrete Codec Tokenizer for Audio Language ModelingLanguage model은 high-dimensional natural signal을 lower-dimensional discrete token으로 compress 하는 tokenizer를 활용함WavTokenizerQuantizer layer와 discrete codec의 temporal dimension을 compressBroader VQ space, contextual window extending, inverse Fourier transform structure를 통해 더 나은 reconstruction quality와 richer sema..