반응형
[Paper 리뷰] IST-TTS: Interpretable Style Transfer for Text-to-Speech with ControlVAE and Diffusion Bridge
IST-TTS: Interpretable Style Transfer for Text-to-Speech with ControlVAE and Diffusion BridgeText-to-Speech에서 style transfer는 중요해지고 있음IST-TTSVariational autoencoder (VAE)와 diffusion refiner를 결합하여 refined mel-spectrogram을 얻음- 이때 audio 품질과 style transfer 성능을 향상하기 위해 two-stage, one-stage system을 각각 설계함Quantized VAE의 diffusion bridge를 통해 complex discrete style representation을 학습하고 transfer 성능을 향상더 나..
Paper/TTS
2024. 5. 5. 09:21
반응형