반응형
[Paper 리뷰] DualVC3: Leveraging Language Model Generated Pseudo Context for End-to-End Low Latency Streaming Voice Conversion
DualVC3: Leveraging Language Model Generated Pseudo Context for End-to-End Low Latency Streaming Voice Conversion최근의 DualVC2는 180ms의 latency로 streaming voice conversion이 가능함- BUT, recognition-synthesis framework로 인해 end-to-end optimization이 어렵고 short chunk를 사용하는 경우 instability가 증가함DualVC3Speaker-independent semantic token을 사용하여 content encoder training을 guideLanguage model을 content encoder outpu..
Paper/Conversion
2024. 12. 25. 10:45
반응형