'Paper/TTS' 카테고리의 글 목록 (7 Page)

[Paper 리뷰] ProsodyFlow: High-Fidelity Text-to-Speech through Conditional Flow Matching and Prosody Modeling with Large Speech Language Models

ProsodyFlow: High-Fidelity Text-to-Speech through Conditional Flow Matching and Prosody Modeling with Large Speech Language ModelsText-to-Speech에서 diverse, natural prosody를 반영하는 것은 여전히 한계가 있음ProsodyFlowLarge self-supervised speech model과 conditional flow matching을 결합해 prosodic feature를 modelingSpeech LLM을 통해 acoustic feature를 추출하고 해당 feature를 prosody latent space에 mapping 한 다음, conditional flow ..

Paper/TTS 2025. 2. 2. 10:31

[Paper 리뷰] FluentTTS: Text-dependent Fine-grained Style Control for Multi-style TTS

FluentTTS: Text-dependent Fine-grained Style Control for Multi-style TTSNeural text-to-speech model은 local prosodic variation을 flexibly control 할 수 있어야 함FluentTTSUtterance-wise global style embedding을 condition으로 하여 각 text의 fundamental frequency $F0$를 예측함추가적으로 global utterance-wise embedding과 local $F0$ embedding을 input으로 사용하는 multi-style encoder를 통해 multi-style embedding을 추정함논문 (INTERSPEECH 202..

Paper/TTS 2025. 1. 13. 11:12

[Paper 리뷰] VoiceLDM: Text-to-Speech with Environmental Context

VoiceLDM: Text-to-Speech with Environmental ContextDescription prompt와 content prompt를 활용하여 audio를 생성할 수 있음- Description prompt는 environmental context를 전달하고 content prompt는 linguistic information을 제공함VoiceLDMLatent diffusion model을 기반으로 하는 text-to-audio model을 채택하고 additional content prompt를 conditional input으로 활용할 수 있도록 확장Contrastive Language-Audio Pretraining과 Whisper를 활용하여 manual annotation, ..

Paper/TTS 2025. 1. 4. 10:19

[Paper 리뷰] Flowtron: An Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis

Flowtron: An Autoregressive Flow-based Generative Network for Text-to-Speech SynthesisStyle transfer, speech variation을 향상하기 위해 autoregressive flow-based generative network를 활용할 수 있음FlowtronTraining data의 likelihood를 maximizing 하여 optimize 되고 simple, stable training을 지원Timbre, expressivity, accent를 modulate할 수 있는 latent space에 대한 invertible mapping을 학습논문 (ICLR 2021) : Paper Link1. Introduction최근..

Paper/TTS 2024. 12. 29. 11:33

[Paper 리뷰] TSP-TTS: Text-based Style Predictor with Residual Vector Quantization for Expressive Text-to-Speech

TSP-TTS: Text-based Style Predictor with Residual Vector Quantization for Expressive Text-to-SpeechExpressive text-to-speech는 다양한 speech style, emotion이 반영된 음성을 합성하는 것을 목표로 함TSP-TTSText 자체에서 추출한 style representation을 기반으로 condition 된 expressive text-to-speech modelText-based style predictor를 위해 Residual Vector Quantization을 도입하고 mel-decoder에 Style-Text Alignment와 Style Hierarchical Layer Normali..

Paper/TTS 2024. 12. 22. 09:06

[Paper 리뷰] FastPitchFormant: Source-Filter based Decomposed Modeling for Speech Synthesis

FastPitchFormant: Source-Filter based Decomposed Modeling for Speech SynthesisText-to-Speech에서 large pitch-shift scale은 품질 저하와 speaker characteristic deformation을 일으킴FastPitchFormantSource-Filter theory를 기반으로 설계된 Feed-Forward Transformer modelText, acoustic feature를 개별적으로 modeling 하여 model이 두 feature 간의 relationship을 학습하는 것을 방지논문 (INTERSPEECH 2021) : Paper Link1. IntroductionText-to-Speech (TTS)..

Paper/TTS 2024. 12. 21. 09:55

이전 1 ··· 4 5 6 7 8 9 10 ··· 26 다음

이전 다음

최근에 올라온 글

최근에 달린 댓글

« 2025/07 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Total

Today

Yesterday

Let IT Begin

티스토리툴바