
REWIND: Speech Time Reversal for Enhancing Speaker Representations in Diffusion-based Voice ConversionSpeech Time Reversal은 speaker identification을 위한 tonal pattern을 가지고 있음REWINDTime-reversed speech에서 학습된 speaker representation을 활용한 augmentation strategy를 도입Diffusion-based voice conversion model에 적용하여 speaker의 unique vocal trait를 preserve 하면서 linguistic content의 interference를 minimize논문 (INTERSP..

ZSDEVC: Zero-Shot Diffusion-based Emotional Voice Conversion with Disentangled MechanismEmotional Voice Conversion은 emotion accuracy와 speech distortion 문제가 존재함ZSDEVCDisentangled mechanism과 expressive guidance를 가지는 diffusion framework를 활용Large emotional speech dataset으로 model을 training논문 (INTERSPEECH 2025) : Paper Link1. IntroductionEmotional Voice Conversion (EVC)는 linguistic content, speaker id..

Discl-VC: Disentangled Discrete Tokens and In-Context Learning for Controllable Zero-Shot Voice ConversionZero-shot Voice Conversion은 source speaker의 speaking style을 accurately replicate 하는데 한계가 있음Discl-VCContent, prosody information을 self-supervised speech representation으로부터 disentangleFlow Matching Transformer와 in-context learning을 통해 target speaker voice를 합성논문 (INTERSPEECH 2025) : Paper Link1..

DiffEmotionVC: A Dual-Granularity Disentangled Diffusion Framework for Any-to-Any Emotional Voice ConversionEmotion Voice Conversion은 content, speaker characteristic 간의 entanglement로 인해 어려움이 있음DiffEmotionVCUtterance-level emotional context와 frame-level acoustic detail을 모두 capture 하는 dual-granularity emotion encoder를 도입Gated cross-attention을 통해 emotion feature를 disentangle 하는 orthogonality-constr..

Training-Free Voice Conversion with Factorized Optimal Transport$k$NN-VC를 training-free pipeline으로 수정할 수 있음MKL-VC$k$NN regression을 Monge-Kantorovich Linear solution에서 derive 된 WavLM embedding subspace 내의 factorized optimal transport map으로 replaceDimension 간 non-uniform variance를 처리하여 effective feature transformation을 보장논문 (INTERSPEECH 2025) : Paper Link1. IntroductionAny-to-Any Voice Conversion ..

FasterVoiceGrad: Faster One-Step Diffusion-based Voice Conversion with Adversarial Diffusion Conversion DistillationDiffusion-based Voice Conversion model은 iterative sampling으로 인해 상당히 느림FasterVoiceGradAdversarial Diffusion Conversion Distillation을 통해 diffusion model과 content encoder를 distill특히 효과적인 distillation을 위해 adversarial distillation, score distillation training을 활용논문 (INTERSPEECH 2025) : ..