반응형
Recycle-and-Distill: Universal Compression Strategy for Transformer-based Speech SSL Models with Attention Map Reusing and Masking DistillationHuBERT와 같은 Speech Self-Supervised Learning model은 상당한 parameter 수를 가짐ARMHuBERTTransformer layer에 대해 attention map을 reuse 하여 model을 compressStudent model의 representation quality를 향상하기 위해 masking distillation strategy를 도입논문 (INTERSPEECH 2023) : Paper Link1...
Paper/Representation
2025. 8. 26. 17:09
반응형
