
Data2Vec 2.0: Efficient Self-Supervised Learning with Contextualized Target Representations for Vision, Speech and LanguageSelf-supervised learning을 위해서는 상당한 computational resource가 필요함Data2Vec 2.0Data2Vec을 기반으로 rich contextualized target representation을 얻고,Fast convolutional decoder를 통해 teacher representation을 build 하는데 필요한 effort를 amortize 함논문 (ICML 2023) : Paper Link1. IntroductionSelf-superv..

Data2Vec: A General Framework for Self-Supervised Learning in Speech, Vision and LanguageSelf-supervised learning은 single modality에 초점을 두고 있음Data2VecSpeech, NLP, vision에 동일한 learning method를 적용하는 self-supervised frameworkStandard transformer architecture를 사용하고, self-distillation setup에서 input의 masked view를 기반으로 full input data의 latent representation을 predict- Modality-specific target 대신 entire i..

XLSR: Unsupervised Cross-Lingual Representation Learning for Speech RecognitionMultiple language에서 single model을 pre-training 하여 cross-lingual speech representation을 얻을 수 있음XLSRWav2Vec 2.0을 기반으로 language 간에 share 되는 latent의 quantization을 jointly learning 함추가적으로 labeled data에서 fine-tuning을 수행논문 (INTERSPEECH 2021) : Paper Link1. IntroductionCross-Lingual learning은 other language를 활용하여 model perfor..

Wav2Vec 2.0: A Framework for Self-Supervised Learning of Speech RepresentationsSpeech audio만으로 powerful representation을 학습하고 transcribed speech에 대한 fine-tuning을 통해 speech recognition 성능을 향상할 수 있음Wav2Vec 2.0Latent space에서 speech input을 maskJointly learned latent representation의 quantization에 대한 contrastive task를 solve'논문 (NeurIPS 2020) : Paper Link1. IntroductionSpeech recognition에서 labeled data는..

Wav2Vec: Unsupervised Pre-Training for Speech RecognitionRaw audio representation을 학습하여 speech recognition에 unsupervised pre-training을 도입할 수 있음Wav2VecUnlabled audio data를 기반으로 training 하고, resulting representation을 acoustic model training을 개선하는 데 사용Noise contrastive binary classification을 통해 simple multi-layer convolutional neural network를 optimize논문 (INTERSPEECH 2019) : Paper Link1. Introductio..

Speech2Vec: A Sequence-to-Sequence Framework for Learning Word Embeddings from SpeechSpeech corpus로부터 얻어진 audio segment의 fixed-length vector representation을 학습하여 semantic information을 얻을 수 있음Speech2VecRNN encoder-decoder framework를 기반으로 semantically simillar 한 embedding을 얻음Training을 위해 Skipgrams, Continuous Bag-of-Words를 활용논문 (INTERSPEECH 2018) : Paper Link1. IntroductionNatural Language Process..