'Paper/Conversion' 카테고리의 글 목록

[Paper 리뷰] REWIND: Speech Time Reversal for Enhancing Speaker Representations in Diffusion-based Voice Conversion

REWIND: Speech Time Reversal for Enhancing Speaker Representations in Diffusion-based Voice ConversionSpeech Time Reversal은 speaker identification을 위한 tonal pattern을 가지고 있음REWINDTime-reversed speech에서 학습된 speaker representation을 활용한 augmentation strategy를 도입Diffusion-based voice conversion model에 적용하여 speaker의 unique vocal trait를 preserve 하면서 linguistic content의 interference를 minimize논문 (INTERSP..

Paper/Conversion 2025. 9. 23. 17:01

[Paper 리뷰] ZSDEVC: Zero-Shot Diffusion-based Emotional Voice Conversion with Disentangled Mechanism

ZSDEVC: Zero-Shot Diffusion-based Emotional Voice Conversion with Disentangled MechanismEmotional Voice Conversion은 emotion accuracy와 speech distortion 문제가 존재함ZSDEVCDisentangled mechanism과 expressive guidance를 가지는 diffusion framework를 활용Large emotional speech dataset으로 model을 training논문 (INTERSPEECH 2025) : Paper Link1. IntroductionEmotional Voice Conversion (EVC)는 linguistic content, speaker id..

Paper/Conversion 2025. 9. 17. 17:00

[Paper 리뷰] Discl-VC: Disentangled Discrete Tokens and In-Context Learning for Controllable Zero-Shot Voice Conversion

Discl-VC: Disentangled Discrete Tokens and In-Context Learning for Controllable Zero-Shot Voice ConversionZero-shot Voice Conversion은 source speaker의 speaking style을 accurately replicate 하는데 한계가 있음Discl-VCContent, prosody information을 self-supervised speech representation으로부터 disentangleFlow Matching Transformer와 in-context learning을 통해 target speaker voice를 합성논문 (INTERSPEECH 2025) : Paper Link1..

Paper/Conversion 2025. 9. 13. 07:50

[Paper 리뷰] DiffEmotionVC: A Dual-Granularity Disentangled Diffusion Framework for Any-to-Any Emotion Voice Conversion

DiffEmotionVC: A Dual-Granularity Disentangled Diffusion Framework for Any-to-Any Emotional Voice ConversionEmotion Voice Conversion은 content, speaker characteristic 간의 entanglement로 인해 어려움이 있음DiffEmotionVCUtterance-level emotional context와 frame-level acoustic detail을 모두 capture 하는 dual-granularity emotion encoder를 도입Gated cross-attention을 통해 emotion feature를 disentangle 하는 orthogonality-constr..

Paper/Conversion 2025. 9. 8. 17:03

[Paper 리뷰] Training-Free Voice Conversion with Factorized Optimal Transport

Training-Free Voice Conversion with Factorized Optimal Transport$k$NN-VC를 training-free pipeline으로 수정할 수 있음MKL-VC$k$NN regression을 Monge-Kantorovich Linear solution에서 derive 된 WavLM embedding subspace 내의 factorized optimal transport map으로 replaceDimension 간 non-uniform variance를 처리하여 effective feature transformation을 보장논문 (INTERSPEECH 2025) : Paper Link1. IntroductionAny-to-Any Voice Conversion ..

Paper/Conversion 2025. 9. 2. 17:02

[Paper 리뷰] FasterVoiceGrad: Faster One-Step Diffusion-based Voice Conversion with Adversarial Diffusion Conversion Distillation

FasterVoiceGrad: Faster One-Step Diffusion-based Voice Conversion with Adversarial Diffusion Conversion DistillationDiffusion-based Voice Conversion model은 iterative sampling으로 인해 상당히 느림FasterVoiceGradAdversarial Diffusion Conversion Distillation을 통해 diffusion model과 content encoder를 distill특히 효과적인 distillation을 위해 adversarial distillation, score distillation training을 활용논문 (INTERSPEECH 2025) : ..

Paper/Conversion 2025. 8. 24. 08:25

이전 1 2 3 4 ··· 10 다음

이전 다음

최근에 올라온 글

최근에 달린 댓글

« 2025/11 »
일	월	화	수	목	금	토
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30

Total

Today

Yesterday

Let IT Begin

티스토리툴바