반응형

SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language ProcessingSelf-supervised speech/text representation learning을 위해 encoder-decoder pre-training을 활용할 수 있음SpeechT5Shared encoder-decoder network와 6개의 modal-specific pre/post-net을 활용Large-scale unlabeled speech-text data를 통해 model을 pre-training 하고 textual, speech information을 unified semantic space에 align 하기 위해 cross-modal vec..
Paper/Representation
2025. 7. 12. 08:10
반응형