Let IT Begin

[Paper 리뷰] StreamVoice: Streamable Context-Aware Language Modeling for Real-time Zero-Shot Voice Conversion

StreamVoice: Streamable Context-Aware Language Modeling for Real-time Zero-Shot Voice ConversionLanguage Model을 활용하여 zero-shot voice conversion 성능을 향상할 수 있음- BUT, 기존 방식은 offline conversion으로 인해 complete source speech 만을 요구하므로 real-time application에서 활용하기 어려움StreamVoiceStreaming capability를 위해 temporal independent acoustic predictor를 포함한 fully causal context-aware Language Model을 도입- 이를 통해 comple..

Paper/Conversion 2024. 10. 13. 12:19

[Paper 리뷰] PL-TTS: A Generalizable Prompt-based Diffusion TTS Augmented by Large Language Model

PL-TTS: A Generalizable Prompt-based Diffusion TTS Augmented by Large Language ModelStyle-controlled Text-to-Speech를 위해 text style description을 사용할 수 있음PL-TTSLarge Language Model로 embed 된 prompt와 diffusion-based Text-to-Speech model을 결합추가적으로 합성 품질과 style controllability를 향상하기 위해 Large Language Model과 diffusion framework를 fine-tuning논문 (INTERSPEECH 2024) : Paper Link1. IntroductionControllable ex..

Paper/TTS 2024. 10. 12. 11:32

[Paper 리뷰] ClariTTS: Feature-ratio Normalization and Duration Stabilization for Code-Mixed Multi-Speaker Speech Synthesis

ClariTTS: Feature-ratio Normalization and Duration Stabilization for Code-Mixed Multi-Speaker Speech SynthesisText-to-Speech model에서 code-mixed text는 speaker-related feature에 source language에 대한 linguistic feature가 포함될 수 있으므로 unnatural accent를 생성할 수 있음ClariTTSFlow-based text-to-speech model에 Feature-ratio Normalized Affine Coupling Layer를 적용- Speaker와 linguistic feature를 disentangle 하여 target sp..

Paper/TTS 2024. 10. 9. 10:30

[Paper 리뷰] DDDM-VC: Decoupled Denoising Diffusion Models with Disentangled Representation and Prior Mixup for Verified Robust Voice Conversion

DDDM-VC: Decoupled Denoising Diffusion Models with Disentangled Representation and Prior Mixup for Verified Robust Voice ConversionDiffusion-based model은 data distribution에 많은 attribute가 존재하고 generation process에서 model parameter sharing에 대한 한계로 인해 각 attribute에 대한 specific style control이 어려움DDDM-VCDecoupled Denoising Diffusion Model을 도입하여 각 attribute에 대한 style transfer를 지원- 특히 voice conversion ta..

Paper/Conversion 2024. 10. 6. 11:13

[Paper 리뷰] DiffVC: Diffusion-based Voice Conversion with Fast Maximum Likelihood Sampling Scheme

DiffVC: Diffusion-based Voice Conversion with Fast Maximum Likelihood Sampling SchemeOne-shot many-to-many voice conversion은 source/target speaker가 모두 training dataset에 속하지 않은 경우에 대해서 single reference utterance의 target voice를 copy 하는 것을 목표로 함DiffVCDiffusion probabilistic modeling을 기반으로 scalable one-shot voice conversion을 수행추가적으로 diffusion model을 가속할 수 있는 Stochastic Differential Equation solver를 ..

Paper/Conversion 2024. 10. 5. 11:49

[Paper 리뷰] VoiceTailor: Lightweight Plug-In Adapter for Diffusion-based Personalized Text-to-Speech

VoiceTailor: Lightweight Plug-In Adapter for Diffusion-based Personalized Text-to-SpeechPre-trained diffusion-based model에 personalized adapter를 결합하여 parameter-efficient speaker adaptive Text-to-Speech를 수행할 수 있음VoiceTailorParameter-Efficient Adaptation을 위해 Low-Rank Adaptation을 활용하고 adapter를 pre-trained diffusion decoder의 pivotal module에 통합Few parameter 만으로 강력한 adaptation을 달성하기 위해 guidance techni..

Paper/TTS 2024. 10. 3. 10:02

[Paper 리뷰] UnitSpeech: Speaker-Adaptive Speech Synthesis with Untranscribed Data

UnitSpeech: Speaker-Adaptive Speech Synthesis with Untranscribed DataMinimal untranscribed data를 사용하여 diffusion-based text-to-speech model을 fine-tuning 할 수 있음UnitSpeechSelf-supervised unit representation을 pseudo transcript로 사용하고 unit encoder를 pre-trained text-to-speech model에 integrate 함Unit encoder를 training 하여 diffusion-based decoder에 speech content를 제공한 다음, single $\langle \text{unit},\text{s..

Paper/TTS 2024. 10. 1. 09:50

[Paper 리뷰] NeXt-TDNN: Modernizing Multi-Scale Temporal Convolution Backbone for Speaker Verification

NeXt-TDNN: Modernizing Multi-Scale Temporal Convolution Backbone for Speaker VerificationConvNet structure를 활용하여 speaker verification을 위한 ECAPA-TDNN을 개선할 수 있음NeXt-TDNNECAPA-TDNN의 SE-Res2Net block을 TS-ConvNeXt block으로 대체- TS-ConvNeXt block은 temporal multi-scale convolution과 frame-wise feed-forward network로 구성됨Frame-wise feed-forward network에 global response normalization을 도입하여 selective feautre p..

Paper/Verification 2024. 9. 29. 11:03

[Paper 리뷰] DualVC: Dual-mode Voice Conversion Using Intra-model Knowledge Distillation and Hybrid Predictive Coding

DualVC: Dual-mode Voice Conversion Using Intra-model Knowledge Distillation and Hybrid Predictive Coding일반적인 non-streaming voice conversion은 전체 utterance를 full context로 활용할 수 있지만, streaming voice conversion은 future information이 제공되지 않으므로 품질이 상당히 저하됨DualVCJointly trained separate network parameter를 활용하여 streaming/non-streaming mode를 지원하는 dual-mode conversion을 활용Streaming conversion의 성능을 향상하기 위해 i..

Paper/Conversion 2024. 9. 28. 09:35

이전 1 ··· 6 7 8 9 10 11 12 ··· 47 다음

이전 다음

최근에 올라온 글

최근에 달린 댓글

« 2025/04 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Total

Today

Yesterday

티스토리툴바