SiTok: Scaling Speech Tokenizers with Diffusion AutoEncodersSpeech tokenizer는 semantic/acoustic encoding trade-off와 low bitrate 활용의 한계가 있음SiTokSupervision을 통해 semantic-rich representation을 jointly learning 하고 diffusion을 통해 high-fidelity audio reconstruction을 지원추가적으로 1.6B parameter로 model을 scale 하고 2M hours의 speech dataset으로 training논문 (ICLR 2026) : Paper Link1. Introduction기존 speech tokenizer는 e..
Gogo: Group-Wise Granularity-Ordered Codec for Stable and Efficient Speech Generation최근의 speech language model은 autoregressive modeling을 위한 high-level cue, perceptual quality를 위한 acoustic detail을 모두 요구함Gogo각 frame group을 coarse-to-fine으로 quantize 하는 group-wise granularity-ordering을 도입추가적으로 granularity-ordering property를 활용해 2-stage speech language model인 GogoSpeech를 구축논문 (ICLR 2026) : Paper Link..
FlexiCodec: A Dynamic Neural Audio Codec for Low Frame Rates기존 neural audio codec은 low frame rate에서 semantic information loss가 발생함FlexiCodecDynamic frame rate를 사용해 semantic preservation을 향상ASR feature-assisted dual stream encoding과 Transformer bottelneck을 도입논문 (ICLR 2026) : Paper Link1. IntroductionNeural audio codec은 raw speech를 compact discrete token으로 compress 함특히 대부분의 neural audio codec은 enc..
SACodec: Asymmetric Quantization with Semantic Anchoring for Low-Bitrate High-Fidelity Neural Speech CodecsNeural speech codec은 low bitrate에서 fundamental trade-off가 존재함SACodecSemantic Anchoring mechanism을 활용한 asymmetric dual quantizer를 도입Semantic/acoustic detail quantization을 decouple 하여 codebook utilization과 fine-grained information reconstruction을 보장논문 (AAAI 2026) : Paper Link1. IntroductionNe..
Speaking Clearly: A Simplified Whisper-based Codec for Low-Bitrate Speech CodingSpeech codec은 acoustic fidelity와 semantic preservation 간의 inherent confilct가 존재함SimWhisper-CodecSemantically-capable model을 high-fidelity acoustic reconstruction에 대해 adapt특히 frozen, simplified Whisper encoder를 활용하여 external supervision 없이 semantic, acoustic preservation을 balancing논문 (ICASSP 2026) : Paper Link1. Intro..
SUNAC: Source-Aware Unified Neural Audio CodecNeural Audio Codec은 multiple source mixture를 entangled manner로 encode 하므로 특정 source의 subset에 access 하는 downstream processing에는 부적합할 수 있음SUNACSource type prompt에 condition되어 mixture에서 individual source를 encodeSource-aware codec을 통해 user-driven selection과 separate encoding을 지원논문 (ICASSP 2026) : Paper Link1. IntroductionNeural Audio Codec (NAC)는 audio s..
