
PAST: Phonetic-Acoustic Speech TokenizerSignal reconstruction과 phonetic information을 jointly modeling 할 수 있음PASTPre-trained self-supervised model 없이 supervised phonetic data를 사용하여 auxiliary task를 통해 domain knowledge를 tokenization process에 integrate추가적으로 real-time application을 위한 streamable architecture를 구성논문 (INTERSPEECH 2025) : Paper Link1. IntroductionSpeech language model은 일반적으로 acoustic toke..

Factorized RVQ-GAN for Disentangled Speech TokenizationBottleneck을 factorize 하는 neural codec을 구성할 수 있음HACPhoneme-level structure를 위한 pre-trained speech encoder와 lexical cue를 위한 text-based encoder의 objective를 활용하여 knowledge distillation objective를 구성Factorized bottleneck을 통해 phoneme align, word-level semantic에 대한 disentangled token set을 생성논문 (INTERSPEECH 2025) : Paper Link1. IntroductionNeural Sp..

LSPNet: An Ultra-Low Bitrate Hybrid Neural CodecUltra-low bitrate에서도 동작할 수 있는 neural codec이 필요함LSPNetLPCNet framework를 기반으로 parameteric encoder를 combine 하여 Line Spectral Pair를 incorporate추가적으로 STFT loss와 Cross-Entropy loss를 활용한 Joint Time-Frequency training strategy를 적용논문 (INTERSPEECH 2025) : Paper Link1. Introduction1.2kbps의 ultra-low bitrate speech coding에서 intelligible, natural-sounding speec..

SpecTokenizer: A Lightweight Streaming Codec in the Compressed Spectrum DomainLightweight neural audio codec이 필요함SpecTokenizerCompressed spectral domain에서 동작하는 lightweight streaming codecCNN, RNN layer를 altering 하여 compressed spectrum domain에서 multi-scale modeling을 수행논문 (INTERSPEECH 2025) : Paper Link1. IntroductionNeural Audio Codec (NAC)는 audio signal을 discrete code sequence로 compress 함BUT, En..

FreeCodec: A Disentangled Neural Speech Codec with Fewer TokensNeural speech codec은 fewer token에 대해서는 성능 저하를 보임FreeCodecDistinct frame-level encoder를 사용하여 intrinsic speech property를 decompose서로 다른 frame-level information을 dedicated quantizer로 quantizing 하여 encoding efficiency를 향상논문 (INTERSPEECH 2025) : Paper Link1. IntroductionNeural Speech Codec은 distortion을 최소화하면서 제한된 bit 수로 speech signal을 com..

SPCodec: Split and Prediction for Neural Speech Codec기존 neural codec은 서로 다른 frequency band 간의 correlation을 fully exploit 하지 못함SPCodecLatent split-and-prediction scheme을 활용한 group residual vector quantization module을 도입Low-/high-frequency representation을 disentangle 하여 feature redundancy를 reduce논문 (INTERSPEECH 2025) : Paper Link1. IntroductionSpeech codec은 일반적으로 encoder, quantizer, decoder로 구성됨특히..