반응형
![](http://i1.daumcdn.net/thumb/C148x148/?fname=https://blog.kakaocdn.net/dn/IaIIp/btsLxh2VrLX/9n9a9dmx5iJ9YryimGcwY1/img.png)
DualVC3: Leveraging Language Model Generated Pseudo Context for End-to-End Low Latency Streaming Voice Conversion최근의 DualVC2는 180ms의 latency로 streaming voice conversion이 가능함- BUT, recognition-synthesis framework로 인해 end-to-end optimization이 어렵고 short chunk를 사용하는 경우 instability가 증가함DualVC3Speaker-independent semantic token을 사용하여 content encoder training을 guideLanguage model을 content encoder outpu..
Paper/Conversion
2024. 12. 25. 10:45
반응형