반응형
[Paper 리뷰] Glow-WaveGAN: Learning Speech Representations from GAN-based Variational Auto-Encoder for High Fidelity Flow-based Speech Synthesis
Glow-WaveGAN: Learning Speech Representations from GAN-based Variational Auto-Encoder for High Fidelity Flow-based Speech SynthesisText-to-Speech 모델은 주로 mel-spectrogram과 같은 low-resolution intermediate representation에 의존하므로 vocoder와 acoustic model 간의 mismatch가 존재함Glow-WaveGANPre-designed intermediate representation에 의존하지 않고 GAN과 결합된 VAE를 사용하여 speech에서 latent representation을 직접 학습이후 flow-based aco..
Paper/TTS
2024. 6. 20. 11:27
반응형