
LiteASR: Efficient Automatic Speech Recognition with Low-Rank ApproximationAutomatic Speech Recognition model은 encoder-decoder architecture로 인해 computationally-intense 함LiteASREncoder에 Low-Rank Compression을 적용하여 transcription accuracy를 maintain 하면서 inference cost를 절감Small calibration dataset을 활용하여 Principal Component Analysis를 적용하여 Linear transformation을 Low-Rank Matrix Multiplication chain으로 ap..

Beyond Hard Sharing: Efficient Multi-Task Speech-to-Text Modeling with Supervised Mixture of ExpertsHard parameter sharing은 task interference로 인해 model performance가 저하됨S-MoE각 task를 designated expert에 route 하는 special guiding token을 활용해 gating function을 eliminate해당 S-MoE를 Speech-to-Text model에 적용하여 mixed-bandwidth input을 처리논문 (INTERSPEECH 2025) : Paper Link1. IntroductionSpeech-to-Text (STT) mode..

M2R-Whisepr: Multi-Stage and Multi-Scale Retrieval Augmentation for Enhancing WhisperWhisper는 다양한 subdialect를 acculately recognize 하는데 한계가 있음M2R-WhisperIn-Context Learning과 Retrieval-Augmented technique을 Whisper에 도입Pre-processing stage에서 sentence-level in-context learning을 적용하고 post-processing stage에서는 token-level $k$-Nearest Neighbor를 적용논문 (ICASSP 2025) : Paper Link1. IntroductionWhisper는 Autom..

Whisper in Medusa's Ear: Multi-head Efficient Decoding for Transformer-based ASRLarge Transformer-based model은 self-attention mechanism으로 인해 computationally intensive 함Whisper-MedusaWhisper architecture를 extend 하여 iteration 마다 multiple token을 predictWord Error Rate에 대한 영향을 최소화하면서 latency를 50% 절감논문 (ICASSP 2025) : Paper Link1. IntroductionWhisper와 같은 Transformer-based supervised model은 Automatic ..

Adapting Whisper for Code-Switching through Encoding Refining and Language-Aware DecodingCode-Switching Automatic Speech Recognition은 여전히 seamless language switch 측면에서 한계가 있음CS-WhisperWhisper를 기반으로 encoder의 intra-sentence switching을 향상하기 위해 Encoder Refiner를 도입각 decoder layer에서 language-specific decoding information을 얻기 위해 서로 다른 language prompt를 가진 Language-Aware Adapter를 활용논문 (ICASSP 2025) : Pap..

Multilingual DistilWhisper: Efficient Distillation of Multi-Task Speech Models via Language-Specific ExpertsWhisper는 under-represented language에 대해 여전히 낮은 성능을 보임Multilingual DistilWhisperWhisper-Large-V2에 대한 knowledge distillation을 적용Language-specific expert를 통한 lightweight modular ASR fine-tuning논문 (ICASSP 2024) : Paper Link1. IntroductionAutomatic Speech Recognition (ASR) task에서 Whisper는 강력한 성..