
Discl-VC: Disentangled Discrete Tokens and In-Context Learning for Controllable Zero-Shot Voice ConversionZero-shot Voice Conversion은 source speaker의 speaking style을 accurately replicate 하는데 한계가 있음Discl-VCContent, prosody information을 self-supervised speech representation으로부터 disentangleFlow Matching Transformer와 in-context learning을 통해 target speaker voice를 합성논문 (INTERSPEECH 2025) : Paper Link1..

DiffEmotionVC: A Dual-Granularity Disentangled Diffusion Framework for Any-to-Any Emotional Voice ConversionEmotion Voice Conversion은 content, speaker characteristic 간의 entanglement로 인해 어려움이 있음DiffEmotionVCUtterance-level emotional context와 frame-level acoustic detail을 모두 capture 하는 dual-granularity emotion encoder를 도입Gated cross-attention을 통해 emotion feature를 disentangle 하는 orthogonality-constr..

Training-Free Voice Conversion with Factorized Optimal Transport$k$NN-VC를 training-free pipeline으로 수정할 수 있음MKL-VC$k$NN regression을 Monge-Kantorovich Linear solution에서 derive 된 WavLM embedding subspace 내의 factorized optimal transport map으로 replaceDimension 간 non-uniform variance를 처리하여 effective feature transformation을 보장논문 (INTERSPEECH 2025) : Paper Link1. IntroductionAny-to-Any Voice Conversion ..

FasterVoiceGrad: Faster One-Step Diffusion-based Voice Conversion with Adversarial Diffusion Conversion DistillationDiffusion-based Voice Conversion model은 iterative sampling으로 인해 상당히 느림FasterVoiceGradAdversarial Diffusion Conversion Distillation을 통해 diffusion model과 content encoder를 distill특히 효과적인 distillation을 위해 adversarial distillation, score distillation training을 활용논문 (INTERSPEECH 2025) : ..

FastVoiceGrad: One-Step Diffusion-based Voice Conversion with Adversarial Conditional Diffusion DistillationDiffusion-based Voice Conversion은 multi-step reverse diffusion으로 인해 추론 속도가 느림FastVoiceGrad기존 voice conversion model의 성능을 유지하면서 multi-step iteration을 one-step으로 reduce이를 위해 Adversarial Conditional Diffusion Distillation을 도입하고 sampling 시 initial state를 reconsidering논문 (INTERSPEECH 2024) : Pa..

ReFlow-VC: Zero-Shot Voice Conversion based on Rectified Flow and Speaker Feature OptimizationDiffusion-based Voice Conversion model은 상당한 sampling step을 요구함ReFlow-VCRectified Flow를 통해 Gaussian distribution을 direct path를 따라 true mel-spectrogram distribution으로 변환추가적으로 content, pitch information을 활용하여 speaker feature를 optimize논문 (INTERSPEECH 2025) : Paper Link1. IntroductionZero-Shot Voice Conversi..