'분류 전체보기' 카테고리의 글 목록 (3 Page)

[Paper 리뷰] EmoReg: Directional Latent Vector Modeling for Emotional Intensity Regularization in Diffusion-based Voice Conversion

EmoReg: Directional Latent Vector Modeling for Emotional Intensity Regularization in Diffusion-based Voice ConversionEmotional Voice Conversion은 linguistic content는 preserve 하면서 source emotion을 주어진 target으로 convert 하는 것을 목표로 함EmoRegEmotion intensity를 control 하기 위해 Self-Supervised Learning-based feature representation을 활용추가적으로 emotional embedding space에서 Unsupervised Directional Latent Vector Mod..

Paper/Conversion 2025. 6. 21. 08:34

[Paper 리뷰] TTS-Transducer: End-to-End Speech Synthesis with Neural Transducer

TTS-Transducer: End-to-End Speech Synthesis with Neural TransducerText-to-Speech를 위해 neural transducer를 활용할 수 있음TTS-TransducerTransducer architecture를 사용하여 tokenized text, speech codec token 간의 first codebook에 대한 monotonic alignment를 학습Non-autoregressive Transformer를 기반으로 transducer loss에서 추출된 alignment를 사용해 remaining code를 predict논문 (ICASSP 2025) : Paper Link1. IntroductionText-to-Speech (TTS)는..

Paper/TTS 2025. 6. 20. 17:14

[Paper 리뷰] SD-HuBERT: Sentence-Level Self-Distillation Induces Syllabic Organization in HuBERT

SD-HuBERT: Sentence-Level Self-Distillation Induces Syllabic Organization in HuBERTSpeech의 sentence-level representation을 학습하여 syllabic organization을 emerge 할 수 있음SD-HuBERTEntire speech를 summarize 하는 aggregator token으로 pre-trained HuBERT를 fine-tuningSupervision 없이 self-distillation objective를 사용하여 salient syllabic structure를 draw추가적으로 Spoken Speech ABX benchmark를 활용하여 sentence-level representati..

Paper/Representation 2025. 6. 19. 17:01

[Paper 리뷰] M2R-Whisper: Multi-Stage and Multi-Scale Retrieval Augmentation for Enhancing Whisper

M2R-Whisepr: Multi-Stage and Multi-Scale Retrieval Augmentation for Enhancing WhisperWhisper는 다양한 subdialect를 acculately recognize 하는데 한계가 있음M2R-WhisperIn-Context Learning과 Retrieval-Augmented technique을 Whisper에 도입Pre-processing stage에서 sentence-level in-context learning을 적용하고 post-processing stage에서는 token-level $k$-Nearest Neighbor를 적용논문 (ICASSP 2025) : Paper Link1. IntroductionWhisper는 Autom..

Paper/ASR 2025. 6. 18. 17:06

[Paper 리뷰] DecoupledSynth: Enhancing Zero-Shot Text-to-Speech via Factors Decoupling

DecoupledSynth: Enhancing Zero-Shot Text-to-Speech via Factors Decoupling기존의 Zero-Shot Text-to-Speech model은 intermediate representation의 linguistic, para-linguistic, non-linguistic information을 balancing 하는데 어려움이 있음DecoupledSynth다양한 self-supervised model을 combine 하여 comprehensive, decoupled representation을 추출Decoupled processing stage를 활용하여 nuanced synthesis를 지원논문 (ICASSP 2025) : Paper Link1. I..

Paper/TTS 2025. 6. 17. 17:20

[Paper 리뷰] ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN based Speaker Verification

ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN based Speaker VerificationSpeaker verification은 speaker representation을 추출하는 neural network에 의존함ECAPA-TDNNInitial frame layer를 1-dimensional Res2Net module로 reconstruct 하고 channel interdependency를 explicitly modeling 하기 위해 Squeeze-and-Excitation block을 도입서로 다른 hierarchical level의 feature를 aggregate, propagate 하고 channe..

Paper/Verification 2025. 6. 16. 17:03

이전 1 2 3 4 5 6 ··· 84 다음

이전 다음

최근에 올라온 글

최근에 달린 댓글

« 2025/07 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Total

Today

Yesterday

Let IT Begin

티스토리툴바