
Robust Data2Vec: Noise-Robust Speech Representation Learning for ASR by Combining Regression and Improved Contrastive LearningContrastive learning과 regression task에 기반한 self-supervised pre-training method를 통해 Automatic Speech Recognition 성능을 향상할 수 있음Robust Data2VecPre-training stage에서 contrastive learning과 regression task를 jointly optimizing추가적으로 patch-based non-semantic negative sample과 positiv..

Data2Vec-AQC: Search for the Right Teaching Assistant in the Teacher-Student Training SetupUnlabled speech data로부터 speech representation을 얻기 위해 Self-Supervised Learning을 활용할 수 있음Data2Vec-AQCData2Vec을 기반으로 data augmentation, quantized representation, clustering을 도입각 module의 interaction을 통해 additional self-supervised objective인 cross-contrastive loss를 solve논문 (ICASSP 2023) : Paper Link1. Introduct..

Data2Vec 2.0: Efficient Self-Supervised Learning with Contextualized Target Representations for Vision, Speech and LanguageSelf-supervised learning을 위해서는 상당한 computational resource가 필요함Data2Vec 2.0Data2Vec을 기반으로 rich contextualized target representation을 얻고,Fast convolutional decoder를 통해 teacher representation을 build 하는데 필요한 effort를 amortize 함논문 (ICML 2023) : Paper Link1. IntroductionSelf-superv..

Data2Vec: A General Framework for Self-Supervised Learning in Speech, Vision and LanguageSelf-supervised learning은 single modality에 초점을 두고 있음Data2VecSpeech, NLP, vision에 동일한 learning method를 적용하는 self-supervised frameworkStandard transformer architecture를 사용하고, self-distillation setup에서 input의 masked view를 기반으로 full input data의 latent representation을 predict- Modality-specific target 대신 entire i..

XLSR: Unsupervised Cross-Lingual Representation Learning for Speech RecognitionMultiple language에서 single model을 pre-training 하여 cross-lingual speech representation을 얻을 수 있음XLSRWav2Vec 2.0을 기반으로 language 간에 share 되는 latent의 quantization을 jointly learning 함추가적으로 labeled data에서 fine-tuning을 수행논문 (INTERSPEECH 2021) : Paper Link1. IntroductionCross-Lingual learning은 other language를 활용하여 model perfor..

Wav2Vec 2.0: A Framework for Self-Supervised Learning of Speech RepresentationsSpeech audio만으로 powerful representation을 학습하고 transcribed speech에 대한 fine-tuning을 통해 speech recognition 성능을 향상할 수 있음Wav2Vec 2.0Latent space에서 speech input을 maskJointly learned latent representation의 quantization에 대한 contrastive task를 solve'논문 (NeurIPS 2020) : Paper Link1. IntroductionSpeech recognition에서 labeled data는..