'분류 전체보기' 카테고리의 글 목록 (8 Page)

[Paper 리뷰] ProsodyFlow: High-Fidelity Text-to-Speech through Conditional Flow Matching and Prosody Modeling with Large Speech Language Models

ProsodyFlow: High-Fidelity Text-to-Speech through Conditional Flow Matching and Prosody Modeling with Large Speech Language ModelsText-to-Speech에서 diverse, natural prosody를 반영하는 것은 여전히 한계가 있음ProsodyFlowLarge self-supervised speech model과 conditional flow matching을 결합해 prosodic feature를 modelingSpeech LLM을 통해 acoustic feature를 추출하고 해당 feature를 prosody latent space에 mapping 한 다음, conditional flow ..

Paper/TTS 2025. 2. 2. 10:31

[Album 리뷰] 여자친구 - <Season of Memories>

쌓아온 기억들의 총화: 여자친구 - - Released : 2025.01.13.- Generes : K-Pop 사랑은 매개체가 필요하다. 절대 혼자서는 완성되지 못하며 무엇보다도 비로소 활활 타오르기 위해서는 그 두터운 장작을 불태울 발화점이 필요한 법이다. 혹여 누군가는 짝사랑을 이야기하겠지만, 그마저도 아무런 근거도 없이 자연발화하지는 않는다. 어떤 형태로든 미묘한 찰나가 쌓여 하루의 감정을 만들고 그 감정은 그동안 쌓인 기억들을 불태울 명백한 불씨가 되기 때문이다. 그렇기에 약 5년 만에 컴백한 여자친구의 신보 역시, 지난 공백기 동안 쌓여온 애틋함을 불태울 매개체로써 자연스럽게 '기억'을 택한다. 그리고 이러한 특징은 그들의 데뷔작인 를 연상시키는 듯한 (이하 )라는 신보 제목에서부터 적나라하..

Music/Review 2025. 2. 1. 14:34

[Paper 리뷰] StableVC: Style Controllable Zero-Shot Voice Conversion with Conditional Flow Matching

StableVC: Style Controllable Zero-Shot Voice Conversion with Conditional Flow MatchingZero-Shot Voice Conversion은 다음의 한계점이 있음- Style과 timbre를 서로 다른 unseen speaker에게 independently transfer 할 수 없음- Autoregressive modeling이나 sampling step으로 인해 추론 속도가 느림- Converted sample의 품질과 similarity는 여전히 만족스럽지 않음StableVCSpeech를 linguistic content, timbre, style로 decompose하고 conditional flow matching module을 사용하..

Paper/Conversion 2025. 1. 28. 14:40

[Paper 리뷰] VoiceMixer: Adversarial Voice Style Mixup

VoiceMixer: Adversarial Voice Style MixupVoice conversion은 source speech와 voice style을 충분히 decompose 하지 못해 여전히 한계가 있음VoiceMixerSelf-supervised representation learning을 활용한 information bottleneck을 통해 content와 style을 decompose 함각 information에 대한 adversarial feedback을 통해 더 나은 generalization을 달성논문 (NeurIPS 2021) : Paper Link1. IntroductionVoice Conversion (VC)는 source speaker의 content information은 유..

Paper/Conversion 2025. 1. 27. 18:24

[Paper 리뷰] Generative Pre-trained Speech Language Model with Efficient Hierarchical Transformer

Generative Pre-trained Speech Language Model with Efficient Hierarchical TransformerSpeech language model은 여전히 neural audio codec의 long acoustic sequence를 modeling 하는데 한계가 있음Generative Pre-trained Speech Transformer (GPST)Audio waveform을 2가지의 discrete speech representation으로 quantize 하고 hierarchical transformer architecture에 integrate 함End-to-End unsupervised manner로 train 됨으로써 다양한 speaker ident..

Paper/Language Model 2025. 1. 26. 12:51

[Paper 리뷰] SpeechX: Neural Codec Language Model as a Versatile Speech Transformer

SpeechX: Neural Codec Language Model as a Versatile Speech TransformerAudio-text prompt 기반의 speech model은 text-to-speech 외의 다양한 task를 처리하는 데는 한계가 있음SpeechXZero-shot Text-to-Speech, Speech Editing, Noise Suppression, Target Speaker Extraction 등의 다양한 task를 지원하는 speech modelNeural codec language modeling과 task-dependent prompting에 기반한 multi-task learning을 도입논문 (TASLP 2024) : Paper Link1. Introducti..

Paper/Language Model 2025. 1. 25. 12:26

이전 1 ··· 5 6 7 8 9 10 11 ··· 71 다음

이전 다음

최근에 올라온 글

최근에 달린 댓글

« 2025/04 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Total

Today

Yesterday

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

Let IT Begin

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역