반응형
MaskVCT: Masked Voice Codec Transformer for Zero-Shot Voice Conversion with Increased Controllability via Multiple Guidances기존의 Voice Conversion model은 fixed conditioning scheme에 의존함MaskVCTContinuous/quantized linguistic feature를 활용하여 intelligibility와 speaker similarity를 향상하고 prosody control을 위해 pitch contuour를 채택특히 multiple Classifier-Free Guidance를 통해 multi-factor control을 지원논문 (ICASSP 2026) :..
Paper/Conversion
2026. 3. 13. 13:54
반응형
