'분류 전체보기' 카테고리의 글 목록 (5 Page)

[Paper 리뷰] F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching

F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow MatchingDiffusion Transformer를 기반으로 fully non-autoregressive text-to-speech system을 구성할 수 있음F5-TTSInput을 ConvNeXt로 modeling 하여 text representation을 refine 하고 easier align을 보장Sway Sampling을 Flow Matching-based model에 적용하여 효과적인 training/inference를 지원논문 (ACL 2025) : Paper Link1. IntroductionVALL-E와 같은 Text-to-Speech (TTS) model은 f..

Paper/TTS 2025. 6. 23. 17:07

[Paper 리뷰] ALMTokenizer: A Low-Bitrate and Semantic-Rich Audio Codec Tokenizer for Audio Language Modeling

ALMTokenizer: A Low-Bitrate and Semantic-Rich Audio Codec Tokenizer for Audio Language ModelingAudio token을 audio language model에서 중요하게 사용됨ALMTokenizerFrame 간의 context information을 explicitly modeling 하여 learnable query token set을 통해 holistic information을 capture 하는 Query-based Compression Strategy를 도입Semantic information을 향상하기 위해 Masked AutoEncoder, Semantic prior-based Vector Quantization, Aut..

Paper/Neural Codec 2025. 6. 22. 08:54

[Paper 리뷰] EmoReg: Directional Latent Vector Modeling for Emotional Intensity Regularization in Diffusion-based Voice Conversion

EmoReg: Directional Latent Vector Modeling for Emotional Intensity Regularization in Diffusion-based Voice ConversionEmotional Voice Conversion은 linguistic content는 preserve 하면서 source emotion을 주어진 target으로 convert 하는 것을 목표로 함EmoRegEmotion intensity를 control 하기 위해 Self-Supervised Learning-based feature representation을 활용추가적으로 emotional embedding space에서 Unsupervised Directional Latent Vector Mod..

Paper/Conversion 2025. 6. 21. 08:34

[Paper 리뷰] TTS-Transducer: End-to-End Speech Synthesis with Neural Transducer

TTS-Transducer: End-to-End Speech Synthesis with Neural TransducerText-to-Speech를 위해 neural transducer를 활용할 수 있음TTS-TransducerTransducer architecture를 사용하여 tokenized text, speech codec token 간의 first codebook에 대한 monotonic alignment를 학습Non-autoregressive Transformer를 기반으로 transducer loss에서 추출된 alignment를 사용해 remaining code를 predict논문 (ICASSP 2025) : Paper Link1. IntroductionText-to-Speech (TTS)는..

Paper/TTS 2025. 6. 20. 17:14

[Paper 리뷰] SD-HuBERT: Sentence-Level Self-Distillation Induces Syllabic Organization in HuBERT

SD-HuBERT: Sentence-Level Self-Distillation Induces Syllabic Organization in HuBERTSpeech의 sentence-level representation을 학습하여 syllabic organization을 emerge 할 수 있음SD-HuBERTEntire speech를 summarize 하는 aggregator token으로 pre-trained HuBERT를 fine-tuningSupervision 없이 self-distillation objective를 사용하여 salient syllabic structure를 draw추가적으로 Spoken Speech ABX benchmark를 활용하여 sentence-level representati..

Paper/Representation 2025. 6. 19. 17:01

[Paper 리뷰] M2R-Whisper: Multi-Stage and Multi-Scale Retrieval Augmentation for Enhancing Whisper

M2R-Whisepr: Multi-Stage and Multi-Scale Retrieval Augmentation for Enhancing WhisperWhisper는 다양한 subdialect를 acculately recognize 하는데 한계가 있음M2R-WhisperIn-Context Learning과 Retrieval-Augmented technique을 Whisper에 도입Pre-processing stage에서 sentence-level in-context learning을 적용하고 post-processing stage에서는 token-level $k$-Nearest Neighbor를 적용논문 (ICASSP 2025) : Paper Link1. IntroductionWhisper는 Autom..

Paper/ASR 2025. 6. 18. 17:06

이전 1 2 3 4 5 6 7 8 ··· 86 다음

이전 다음

최근에 올라온 글

최근에 달린 댓글

« 2025/07 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Total

Today

Yesterday

Let IT Begin

티스토리툴바