반응형

Generative Pre-trained Speech Language Model with Efficient Hierarchical TransformerSpeech language model은 여전히 neural audio codec의 long acoustic sequence를 modeling 하는데 한계가 있음Generative Pre-trained Speech Transformer (GPST)Audio waveform을 2가지의 discrete speech representation으로 quantize 하고 hierarchical transformer architecture에 integrate 함End-to-End unsupervised manner로 train 됨으로써 다양한 speaker ident..
Paper/Language Model
2025. 1. 26. 12:51
반응형