'분류 전체보기' 카테고리의 글 목록 (9 Page)

[Paper 리뷰] Expressive-VC: Highly Expressive Voice Conversion with Attention Fusion of Bottleneck and Perturbation Features

Expressive-VC: Highly Expressive Voice Conversion with Attention Fusion of Bottleneck and Perturbation FeaturesVoice conversion은 speaker similarity, intelligibility, expressiveness 측면에서 한계가 있음Expressive-VCNeural bottleneck feature approach와 information perturbation approach를 결합한 end-to-end voice conversion modelBottleneck feature encoder와 perturbe wav encoder를 사용하여 linguistic, para-linguistic fe..

Paper/Conversion 2024. 12. 28. 09:59

[Paper 리뷰] DualVC3: Leveraging Language Model Generated Pseudo Context for End-to-End Low Latency Streaming Voice Conversion

DualVC3: Leveraging Language Model Generated Pseudo Context for End-to-End Low Latency Streaming Voice Conversion최근의 DualVC2는 180ms의 latency로 streaming voice conversion이 가능함- BUT, recognition-synthesis framework로 인해 end-to-end optimization이 어렵고 short chunk를 사용하는 경우 instability가 증가함DualVC3Speaker-independent semantic token을 사용하여 content encoder training을 guideLanguage model을 content encoder outpu..

Paper/Conversion 2024. 12. 25. 10:45

[Paper 리뷰] TSP-TTS: Text-based Style Predictor with Residual Vector Quantization for Expressive Text-to-Speech

TSP-TTS: Text-based Style Predictor with Residual Vector Quantization for Expressive Text-to-SpeechExpressive text-to-speech는 다양한 speech style, emotion이 반영된 음성을 합성하는 것을 목표로 함TSP-TTSText 자체에서 추출한 style representation을 기반으로 condition 된 expressive text-to-speech modelText-based style predictor를 위해 Residual Vector Quantization을 도입하고 mel-decoder에 Style-Text Alignment와 Style Hierarchical Layer Normali..

Paper/TTS 2024. 12. 22. 09:06

[Paper 리뷰] FastPitchFormant: Source-Filter based Decomposed Modeling for Speech Synthesis

FastPitchFormant: Source-Filter based Decomposed Modeling for Speech SynthesisText-to-Speech에서 large pitch-shift scale은 품질 저하와 speaker characteristic deformation을 일으킴FastPitchFormantSource-Filter theory를 기반으로 설계된 Feed-Forward Transformer modelText, acoustic feature를 개별적으로 modeling 하여 model이 두 feature 간의 relationship을 학습하는 것을 방지논문 (INTERSPEECH 2021) : Paper Link1. IntroductionText-to-Speech (TTS)..

Paper/TTS 2024. 12. 21. 09:55

[Paper 리뷰] DPP-TTS: Diversifying Prosodic Features of Speech via Determinantal Point Process

DPP-TTS: Diversifying Prosodic Features of Speech via Determinantal Point ProcessesText-to-Speech model은 다양한 prosody를 합성할 수 있어야 함- BUT, 기존 model은 prosody diversity를 향상하기 위해 scaled sampling temperature에 의존함- Sampling procedure는 single speech sample에 focus 하므로 sample 간 diversity가 neglect 됨DPP-TTSProsody diversifying module과 Determinantal Point Process에 기반한 text-to-speech model여러 sample 간의 perceptu..

Paper/TTS 2024. 12. 15. 12:04

[Paper 리뷰] DualSpeech: Enhancing Speaker-Fidelity and Text-Intelligibility Through Dual Classifier-Free Guidance

DualSpeech: Enhancing Speaker-Fidelity and Text-Intelligibility Through Dual Classifier-Free Guidance다양한 control demand 하에서 speaker-fidelity와 text-intelligibility 간의 optimal balance를 달성하는 것은 어려움DualSpeechPhoneme-level latent diffusion과 Dual classifier-free guidance를 도입Sophisticated control을 통해 fidelity와 intelligibility를 향상논문 (INTERSPEECH 2024) : Paper Link1. IntroductionText-to-Speech (TTS)는 hum..

Paper/TTS 2024. 12. 14. 10:20

이전 1 ··· 6 7 8 9 10 11 12 ··· 70 다음

이전 다음

최근에 올라온 글

최근에 달린 댓글

« 2025/04 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Total

Today

Yesterday

Let IT Begin

티스토리툴바