Let IT Begin

[Paper 리뷰] ZSVC: Zero-Shot Style Voice Conversion with Disentangled Latent Diffusion Models and Adversarial Training

ZSVC: Zero-Shot Style Voice Conversion with Disentangled Latent Diffusion Models and Adversarial TrainingStyle voice conversion은 original speaker identity를 유지하면서 source speech의 speaking style을 desired style로 변환하는 것을 목표로 함ZSVCSpeech codec과 speech prompting mechanism을 포함한 latent diffusion model을 활용Speaking style, timbre를 disentangle 하기 위해 information bottleneck을 도입하고 Uncetainty Modeling Adaptive I..

Paper/Conversion 2025. 3. 28. 15:15

[Paper 리뷰] ComplexDec: A Domain-Robust High-Fidelity Neural Audio Codec with Complex Spectrum Modeling

ComplexDec: A Domain-Robust High-Fidelity Neural Audio Codec with Complex Spectrum Modeling기존의 neural audio codec은 out-of-domain audio를 modeling 하는데 어려움이 있음ComplexDecOut-of-Domain robustness는 codec compression으로 인한 information loss로 인해 발생24kbps bitrate에서 해당 information loss를 완화하기 위해 complex spectral input/output을 활용논문 (ICASSP 2025) : Paper Link1. IntroductionDigital Signal Processing (DSP)-based..

Paper/Neural Codec 2025. 3. 27. 20:17

[Paper 리뷰] NanoVoice: Efficient Speaker-Adaptive Text-to-Speech for Multiple Speakers

NanoVoice: Efficient Speaker-Adaptive Text-to-Speech for Multiple SpeakersMultiple speaker에 대한 adapter를 활용하여 personalized text-to-speech model을 구성할 수 있음NanoVoiceMultiple reference를 parallel fine-tuning 할 수 있는 batch-wise speaker adaptation을 활용추가적으로 speaker adaptation parameter를 줄이기 위해 parameter sharing을 도입하고, trainable scale matrix를 incorporate논문 (ICASSP 2025) : Paper Link1. IntroductionVALL-E, V..

Paper/TTS 2025. 3. 26. 20:31

[Paper 리뷰] SlimSpeech: Lightweight and Efficient Text-to-Speech with Slim Rectified Flow

SlimSpeech: Lightweight and Efficient Text-to-Speech with Slim Rectified FlowFlow matching-based speech synthesis model은 inference step을 줄이면서 speech quality를 향상할 수 있음SlimSpeechRectified flow model을 기반으로 parameter 수를 줄이고 teacher model로 활용Reflow operation을 refine 하여 straight sampling trajectory를 가지는 smaller model을 directly derive 하고 distillation method를 통해 성능을 향상논문 (ICASSP 2025) : Paper Link1. Int..

Paper/TTS 2025. 3. 25. 20:49

[Paper 리뷰] kNN-VC: Voice Conversion with Just Nearest Neighbors

kNN-VC: Voice Conversion with Just Nearest Neighbors최근의 any-to-any voice conversion system은 complexity가 증가하여 reproduce가 어려움kNN-VCSource, reference speech의 self-supervised representation을 추출한 다음, source representation의 각 frame을 reference의 nearest neighbor로 replace최종적으로 pretrained vocoder를 통해 converted representation을 audio로 변환논문 (INTERSPEECH 2023) : Paper Link1. IntroductionVoice Conversion (VC)는..

Paper/Conversion 2025. 3. 24. 21:24

[Paper 리뷰] Wav2Vec 2.0: A Framework for Self-Supervised Learning of Speech Representations

Wav2Vec 2.0: A Framework for Self-Supervised Learning of Speech RepresentationsSpeech audio만으로 powerful representation을 학습하고 transcribed speech에 대한 fine-tuning을 통해 speech recognition 성능을 향상할 수 있음Wav2Vec 2.0Latent space에서 speech input을 maskJointly learned latent representation의 quantization에 대한 contrastive task를 solve'논문 (NeurIPS 2020) : Paper Link1. IntroductionSpeech recognition에서 labeled data는..

Paper/Representation 2025. 3. 23. 08:52

[Paper 리뷰] Wav2Vec: Unsupervised Pre-Training for Speech Recognition

Wav2Vec: Unsupervised Pre-Training for Speech RecognitionRaw audio representation을 학습하여 speech recognition에 unsupervised pre-training을 도입할 수 있음Wav2VecUnlabled audio data를 기반으로 training 하고, resulting representation을 acoustic model training을 개선하는 데 사용Noise contrastive binary classification을 통해 simple multi-layer convolutional neural network를 optimize논문 (INTERSPEECH 2019) : Paper Link1. Introductio..

Paper/Representation 2025. 3. 22. 09:07

[Paper 리뷰] PriorSinger: Singing Voice Synthesis Model with Prior Condition Cross Attention

PriorSinger: Singing Voice Synthesis Model with Prior Condition Cross AttentionSinging voice synthesis는 주어진 musical score를 기반으로 expressive, realistic singing을 생성하는 것을 목표로 함PriorSingerDenoising process 중에 prior cross-attention transformer를 사용하여 diffusion denoiser를 guidingGenerated acoustic feature resolution을 향상하기 위해 diffusion denoiser 내에서 time/frequency domain에 대한 attention mechanism을 도입추가적으로 ro..

Paper/SVS 2025. 3. 21. 17:56

[Paper 리뷰] Speech2Vec: A Sequence-to-Sequence Framework for Learning Word Embeddings from Speech

Speech2Vec: A Sequence-to-Sequence Framework for Learning Word Embeddings from SpeechSpeech corpus로부터 얻어진 audio segment의 fixed-length vector representation을 학습하여 semantic information을 얻을 수 있음Speech2VecRNN encoder-decoder framework를 기반으로 semantically simillar 한 embedding을 얻음Training을 위해 Skipgrams, Continuous Bag-of-Words를 활용논문 (INTERSPEECH 2018) : Paper Link1. IntroductionNatural Language Process..

Paper/Representation 2025. 3. 20. 21:44

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

Let IT Begin

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역

« 2025/04 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30