
ReFlow-VC: Zero-Shot Voice Conversion based on Rectified Flow and Speaker Feature OptimizationDiffusion-based Voice Conversion model은 상당한 sampling step을 요구함ReFlow-VCRectified Flow를 통해 Gaussian distribution을 direct path를 따라 true mel-spectrogram distribution으로 변환추가적으로 content, pitch information을 활용하여 speaker feature를 optimize논문 (INTERSPEECH 2025) : Paper Link1. IntroductionZero-Shot Voice Conversi..

LinearVC: Linear Transformations of Self-Supervised Features through the Lens of Voice ConversionSelf-supervised representation을 활용하여 voice conversion method를 구성할 수 있음LinearVCSelf-supervised feature에 대한 simple linear transformation을 통해 voice를 convertingAllowed transformation set을 constraining 하고 singular value decomposition을 통해 content, speaker information을 explicitly factorize논문 (INTERSPEECH 20..

ClapFM-EVC: High-Fidelity and Flexible Emotional Voice Conversion with Dual Control from Natural Language and SpeechEmotional Voice Conversion에서 flexible, interpretable control은 여전히 한계가 있음ClapFM-EVCNatural language prompt와 catrgorical label을 통해 guide 되는 emotional contrastive language-audio pre-training model을 도입Pre-trained Automatic Speech Recognition model의 Phonetic PosteriorGram을 seamless fuse..

LM-VC: Zero-Shot Voice Conversion via Speech Generation based on Language ModelsZero-shot voice conversion을 위해 language model을 활용할 수 있음LM-VCSource linguistic content와 target speaker timbre를 recover 하는 coarse token과 converted speech의 acoustic detail을 reconstruct 하는 fine token을 활용Content preservation과 disentanglement를 위해 masked prefix Language Model을 적용추가적으로 sampling error를 alleviate 하기 위해 local a..

StarVC: A Unified Auto-Regressive Framework for Joint Text and Speech Generation in Voice Conversion기존의 Voice Conversion model은 linguistic content의 explicit utilization을 neglect 함StarVCExplicit text modeling을 voice conversion에 integrateText token을 먼저 predict 한 다음 acoustic feature를 synthesize 하는 autoregressive framework를 활용논문 (INTERSPEECH 2025) : Paper Link1. IntroductionVoice Conversion (VC)는 ut..

EmoReg: Directional Latent Vector Modeling for Emotional Intensity Regularization in Diffusion-based Voice ConversionEmotional Voice Conversion은 linguistic content는 preserve 하면서 source emotion을 주어진 target으로 convert 하는 것을 목표로 함EmoRegEmotion intensity를 control 하기 위해 Self-Supervised Learning-based feature representation을 활용추가적으로 emotional embedding space에서 Unsupervised Directional Latent Vector Mod..