MeanVoiceFlow: One-Step Nonparallel Voice Conversion with Mean FlowsVoice Conversion에서 flow-matching model은 iterative inference로 인한 한계가 있음MeanVoiceFlowMean Flow를 기반으로 pre-training, distillation 없이 one-step non-parallel conversion을 지원추가적으로 structural margin reconstruction loss, zero-input constraint를 도입하여 model의 input-output behavior를 regularize논문 (ICASSP 2026) : Paper Link1. IntroductionVoice Co..
MaskVCT: Masked Voice Codec Transformer for Zero-Shot Voice Conversion with Increased Controllability via Multiple Guidances기존의 Voice Conversion model은 fixed conditioning scheme에 의존함MaskVCTContinuous/quantized linguistic feature를 활용하여 intelligibility와 speaker similarity를 향상하고 prosody control을 위해 pitch contuour를 채택특히 multiple Classifier-Free Guidance를 통해 multi-factor control을 지원논문 (ICASSP 2026) :..
MeanVC: Lightweight and Streaming Zero-Shot Voice Conversion via Mean Flows기존의 zero-shot Voice Conversion model은 large parameter size를 요구함MeanVCChunk-wise autoregressive denoising 기반의 diffusion Transformer를 활용해 streaming processing을 지원Mean flow를 통해 single sampling step 만으로도 zero-shot Voice Conversion 성능을 향상논문 (ICASSP 2026) : Paper Link1. IntroductionACE-VC, SEF-VC, AdaptVC와 같은 zero-shot Voice Co..
MF-Speech: Achieving Fine-Grained and Compositional Control in Speech Generation via Factor DisentanglementExpressive, controllable speech를 생성하기 위해서는 speech factor의 entanglement와 control mechanism의 coarse granularity를 해결해야 함MF-SpeechFactor purifier로 사용되는 MF-SpeechEncoder를 기반으로 multi-objective optimization을 수행하여 original speech signal을 independent representation으로 decomposeConductor로 사용되는 MF-Spee..
REWIND: Speech Time Reversal for Enhancing Speaker Representations in Diffusion-based Voice ConversionSpeech Time Reversal은 speaker identification을 위한 tonal pattern을 가지고 있음REWINDTime-reversed speech에서 학습된 speaker representation을 활용한 augmentation strategy를 도입Diffusion-based voice conversion model에 적용하여 speaker의 unique vocal trait를 preserve 하면서 linguistic content의 interference를 minimize논문 (INTERSP..
ZSDEVC: Zero-Shot Diffusion-based Emotional Voice Conversion with Disentangled MechanismEmotional Voice Conversion은 emotion accuracy와 speech distortion 문제가 존재함ZSDEVCDisentangled mechanism과 expressive guidance를 가지는 diffusion framework를 활용Large emotional speech dataset으로 model을 training논문 (INTERSPEECH 2025) : Paper Link1. IntroductionEmotional Voice Conversion (EVC)는 linguistic content, speaker id..
