반응형

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech ProcessingSpeech siganl에는 speaker identity, paralinguistics, spoken content 등의 multi-faceted information이 포함되어 있음WavLMPre-training 과정에서 masked speech prediction, denoising을 jointly leariningInput speech의 sequence ordering을 capture 하기 위해 Transformer structure에 gated relative position bias를 도입논문 (JSTSP 2022) : Paper Link1. Intro..
Paper/Representation
2025. 4. 19. 21:15
반응형