반응형
[Paper 리뷰] HierSpeech: Bridging the Gap between Text and Speech by Hierarchical Variational Inference using Self-Supervised Representations for Speech Synthesis
HierSpeech: Bridging the Gap between Text and Speech by Hierarchical Variational Inference using Self-Supervised Representations for Speech Synthesis Text로부터 raw waveform을 직접 생성하는 end-to-end pipeline은 고품질의 합성이 가능하지만, mispronunciation이나 over-smoothing 문제가 종종 발생함 HierSpeech 고품질의 end-to-end text-to-speech를 위해 self-supervised speech representation을 활용하는 hierarchical variational autoencoder 구조 Self-s..
Paper/TTS
2024. 3. 29. 11:29
반응형