Let IT Begin

[Paper 리뷰] MaskCycleGAN-VC: Learning Non-Parallel Voice Conversion with Filling in Frames

MaskCycleGAN-VC: Learning Non-Parallel Voice Conversion with Filling in FramesNon-parallel voice conversion을 위한 Cycle-Consistent Adversarial Network-based 방식은 time-frequency structure를 capture 하는 능력이 부족함MaskCycleGAN-VCCycleGAN-VC2의 확장으로써 Filling in Frames를 통해 training 하여 얻어짐Filling in Frames를 사용하여 input mel-spectrogram에 temporal mask를 적용하고 converter가 surrounding frame을 기반으로 missing frame을 fillin..

Paper/Conversion 2024. 8. 22. 10:27

[Paper 리뷰] CycleGAN-VC3: Examining and Improving CycleGAN-VCs for Mel-Spectrogram Conversion

CycleGAN-VC3: Examining and Improving CycleGAN-VCs for Mel-Spectrogram ConversionNon-parallel voice conversion에서 CycleGAN-VC가 우수한 성능을 보임- BUT, mel-spectrogram conversion에 대한 ambiguity로 인해 time-frequency structure가 손상됨CycleGAN-VC3Time-Frequency Adaptive Normalization을 도입하여 time-frequency structure를 반영기존 CycleGAN의 mel-spectrogram conversion 성능을 향상논문 (INTERSPEECH 2020) : Paper Link1. IntroductionVo..

Paper/Conversion 2024. 8. 21. 09:15

[Paper 리뷰] AVQVC: One-Shot Voice Conversion by Vector Quantization with Applying Contrastive Learning

AVQVC: One-Shot Voice Conversion by Vector Quantization with Applying Contrastive LearningVoice conversion은 speech signal에서 timbre와 linguistic content를 disentangle 하여 수행될 수 있음AVQVCVQVC와 AutoVC를 결합한 one-shot voice conversion frameworkContent, timbre를 분리하기 위한 training method를 VQVC에 적용논문 (ICASSP 2022) : Paper Link1. IntroductionVoice Conversion (VC)는 original utterance의 content를 유지하면서 target speake..

Paper/Conversion 2024. 8. 20. 09:01

[Paper 리뷰] StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion

StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice ConversionUnsupervised non-parallel many-to-many voice conversion을 위해 generative adversarial network를 활용할 수 있음StarGANv2-VCAdversarial source classifier loss와 perceptual loss를 결합하여 사용Style encoder를 통해 plain reading speech를 stylistic speech로 변환논문 (INTERSPEECH 2021) : Paper Link1. IntroductionVoice Conversion (..

Paper/Conversion 2024. 8. 18. 09:33

[Paper 리뷰] Blow: A Single-Scale Hyperconditioned Flow for Non-Parallel Raw-Audio Voice Conversion

Blow: A Single-Scale Hyperconditioned Flow for Non-Parallel Raw-Audio Voice ConversionMany-to-Many voice conversion을 위해서는 non-parallel data를 활용할 수 있어야 함BlowHypernetwork conditioning과 single-scale normalizing flow를 활용Single speaker identifier를 사용하여 frame-by-frame으로 end-to-end training 됨논문 (NeurIPS 2019) : Paper Link1. IntroductionRaw audio는 intermediate representation을 사용하는 것보다 더 많은 model capacit..

Paper/Conversion 2024. 8. 17. 11:46

[Paper 리뷰] FragmentVC: Any-to-Any Voice Conversion by End-to-End Extracting and Fusing Fine-Grained Voice Fragments with Attention

FragmentVC: Any-to-Any Voice Conversion by End-to-End Extracting and Fusing Fine-Grained Voice Fragments with AttentionAny-to-Any voice conversion은 unseen any speaker에 대해 voice conversion을 수행하는 것을 목표로 함FragmentVCWav2Vec 2.0을 통해 source speaker의 latent phonetic structure를 얻고 target speaker의 spectral feature를 log mel-spectrogram을 통해 얻음두 가지의 서로 다른 feature space를 two-stage training process를 통해 align ..

Paper/Conversion 2024. 8. 16. 09:07

[Paper 리뷰] SEF-VC: Speaker Embedding Free Zero-Shot Voice Conversion with Cross Attention

SEF-VC: Speaker Embedding Free Zero-Shot Voice Conversion with Cross AttentionZero-shot voice conversion은 unseen target speaker로의 변환을 수행할 수 있지만, speaker similarity 측면에서 한계가 있음SEF-VCSpeaker embedding을 사용하지 않고 Position-Agnostic Cross-Attention을 도입하여 reference speech에서 speaker timbre를 학습이후 HuBERT semantic token으로부터 non-autoregressive 방식으로 waveform을 reconstruct논문 (ICASSP 2024) : Paper Link1. Introdu..

Paper/Conversion 2024. 8. 15. 09:26

[Paper 리뷰] DRVC: A Framework of Any-to-Any Voice Conversion with Self-Supervised Learning

DRVC: A Framework of Any-to-Any Voice Conversion with Self-Supervised LearningAny-to-Any voice conversion은 training data에서 벗어난 source/target speaker에 대해 voice conversion을 수행하는 것을 목표로 함- BUT, 기존의 disentangle-based model은 speaker/content style information를 얻는 과정에서 untangle overlapping 문제가 발생함DRVC (Disentangled Representation Voice Conversion)Content encoder, timbre encoder, generator로 구성된 end-to-e..

Paper/Conversion 2024. 8. 14. 09:39

[Paper 리뷰] VQMIVC: Quantization and Mutual Information-based Unsupervised Speech Representation Disentanglement for One-Shot Voice Conversion

VQMIVC: Quantization and Mutual Information-based Unsupervised Speech Representation Disentanglement for One-Shot Voice ConversionOne-shot voice conversion은 speech representation disentanglement를 통해 효과적으로 수행될 수 있음- BUT, 기존 방식은 speech representation 간의 correlation을 무시하므로 content information이 leakage 될 수 있음VQMIVCContent encoding 과정에서 vector quantization을 사용하고 training 중에 correlation metric으로써 mutu..

Paper/Conversion 2024. 8. 13. 09:41

이전 1 ··· 26 27 28 29 30 31 32 ··· 65 다음

이전 다음

최근에 올라온 글

최근에 달린 댓글

« 2025/12 »
일	월	화	수	목	금	토
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Total

Today

Yesterday

티스토리툴바