'분류 전체보기' 카테고리의 글 목록 (7 Page)

[Paper 리뷰] CycleGAN-VC3: Examining and Improving CycleGAN-VCs for Mel-Spectrogram Conversion

CycleGAN-VC3: Examining and Improving CycleGAN-VCs for Mel-Spectrogram ConversionNon-parallel voice conversion에서 CycleGAN-VC가 우수한 성능을 보임- BUT, mel-spectrogram conversion에 대한 ambiguity로 인해 time-frequency structure가 손상됨CycleGAN-VC3Time-Frequency Adaptive Normalization을 도입하여 time-frequency structure를 반영기존 CycleGAN의 mel-spectrogram conversion 성능을 향상논문 (INTERSPEECH 2020) : Paper Link1. IntroductionVo..

Paper/Conversion 2024. 8. 21. 09:15

[Paper 리뷰] AVQVC: One-Shot Voice Conversion by Vector Quantization with Applying Contrastive Learning

AVQVC: One-Shot Voice Conversion by Vector Quantization with Applying Contrastive LearningVoice conversion은 speech signal에서 timbre와 linguistic content를 disentangle 하여 수행될 수 있음AVQVCVQVC와 AutoVC를 결합한 one-shot voice conversion frameworkContent, timbre를 분리하기 위한 training method를 VQVC에 적용논문 (ICASSP 2022) : Paper Link1. IntroductionVoice Conversion (VC)는 original utterance의 content를 유지하면서 target speake..

Paper/Conversion 2024. 8. 20. 09:01

[Paper 리뷰] StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion

StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice ConversionUnsupervised non-parallel many-to-many voice conversion을 위해 generative adversarial network를 활용할 수 있음StarGANv2-VCAdversarial source classifier loss와 perceptual loss를 결합하여 사용Style encoder를 통해 plain reading speech를 stylistic speech로 변환논문 (INTERSPEECH 2021) : Paper Link1. IntroductionVoice Conversion (..

Paper/Conversion 2024. 8. 18. 09:33

[Paper 리뷰] Blow: A Single-Scale Hyperconditioned Flow for Non-Parallel Raw-Audio Voice Conversion

Blow: A Single-Scale Hyperconditioned Flow for Non-Parallel Raw-Audio Voice ConversionMany-to-Many voice conversion을 위해서는 non-parallel data를 활용할 수 있어야 함BlowHypernetwork conditioning과 single-scale normalizing flow를 활용Single speaker identifier를 사용하여 frame-by-frame으로 end-to-end training 됨논문 (NeurIPS 2019) : Paper Link1. IntroductionRaw audio는 intermediate representation을 사용하는 것보다 더 많은 model capacit..

Paper/Conversion 2024. 8. 17. 11:46

[Paper 리뷰] FragmentVC: Any-to-Any Voice Conversion by End-to-End Extracting and Fusing Fine-Grained Voice Fragments with Attention

FragmentVC: Any-to-Any Voice Conversion by End-to-End Extracting and Fusing Fine-Grained Voice Fragments with AttentionAny-to-Any voice conversion은 unseen any speaker에 대해 voice conversion을 수행하는 것을 목표로 함FragmentVCWav2Vec 2.0을 통해 source speaker의 latent phonetic structure를 얻고 target speaker의 spectral feature를 log mel-spectrogram을 통해 얻음두 가지의 서로 다른 feature space를 two-stage training process를 통해 align ..

Paper/Conversion 2024. 8. 16. 09:07

[Paper 리뷰] SEF-VC: Speaker Embedding Free Zero-Shot Voice Conversion with Cross Attention

SEF-VC: Speaker Embedding Free Zero-Shot Voice Conversion with Cross AttentionZero-shot voice conversion은 unseen target speaker로의 변환을 수행할 수 있지만, speaker similarity 측면에서 한계가 있음SEF-VCSpeaker embedding을 사용하지 않고 Position-Agnostic Cross-Attention을 도입하여 reference speech에서 speaker timbre를 학습이후 HuBERT semantic token으로부터 non-autoregressive 방식으로 waveform을 reconstruct논문 (ICASSP 2024) : Paper Link1. Introdu..

Paper/Conversion 2024. 8. 15. 09:26

이전 1 ··· 4 5 6 7 8 9 10 ··· 61 다음

이전 다음

최근에 올라온 글

최근에 달린 댓글

« 2024/12 »
일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

Total

Today

Yesterday

Let IT Begin

티스토리툴바