'2025/05 글 목록

[Paper 리뷰] SECodec: Structural Entropy-based Compressive Speech Representation Codec for Speech Language Models

SECodec: Structural Entropy-based Compressive Speech Representation Codec for Speech Language ModelsLarge Language Model을 위한 기존의 speech representation discretization method는 Euclidean distance-based quantization이나 pre-defined codebook에 의존함SECodecSpeech를 graph로 modeling 하고 graph 내의 speech feature node를 clustering 한 다음, 2D Strutural Entropy를 minimize 하여 codebook을 추출- 2D SE minimization principle을 ..

Paper/Neural Codec 2025. 5. 31. 10:11

[Paper 리뷰] SEVC: Voice Conversion via Structural Entropy

SEVC: Voice Conversion via Structural Entropy기존의 voice conversion method는 prosody leakage, speech representation blurring의 문제가 있음SEVCSource, reference speech에서 self-supervised representation을 추출하고 reference speech representation을 graph로 구축이후 2D Structural Entropy를 사용하여 semantically similar representation을 clustering- Voice conversion 시 source representation의 각 frame을 new node로 취급하고, SE를 통해 각 nod..

Paper/Conversion 2025. 5. 30. 17:35

[Paper 리뷰] LiveSpeech: Low-Latency Zero-Shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes

LiveSpeech: Low-Latency Zero-Shot Text-to-Speech via Autoregressive Modeling of Audio Discrete CodesNeural audio codec을 통해 zero-shot text-to-speech가 가능하지만 low-latency scenario에서 활용하기 어려움LiveSpeech각 frame의 codebook contribution을 고려한 adaptive codebook loss를 도입Codebook을 grouping 하고 해당 group에 대한 parallel processing을 수행논문 (INTERSPEECH 2024) : Paper Link1. IntroductionNaturalSpeech2와 같은 Zero-shot Text..

Paper/TTS 2025. 5. 29. 17:28

[Paper 리뷰] ZCS-CDiff: A Zero-Shot Code-Switching TTS System with Conformer-Based Diffusion Model

ZCS-CDiff: A Zero-Shot Code-Switching TTS System with Conformer-Based Diffusion ModelCode-Switching Text-to-Speech는 zero-shot scenario에서 활용하기에 한계가 있음ZCS-CDiffSpeech feature를 disentangle 하고 diffusion model을 사용하여 해당 disentangled attribute를 modelingConformer-based WaveNet을 denoising network로 활용하여 attribute modeling을 개선추가적으로 speaker-assist module을 통해 speaker similarity를 향상논문 (ICASSP 2025) : Paper Li..

Paper/TTS 2025. 5. 28. 17:41

[Paper 리뷰] MB-iSTFT-VITS: Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse Short-Time Fourier Transform

MB-iSTFT-VITS: Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse Short-Time Fourier TransformLightweight end-to-end text-to-speech model이 필요함MB-iSTFT-VITSComputationally expensive component를 simple inverse Short-Time Fourier Transform으로 replaceFixed/trainable synthesis filter를 가지는 multi-band generation을 통해 waveform을 생성논문 (ICASSP 2023) : Paper Link1. I..

Paper/TTS 2025. 5. 27. 17:49

[Paper 리뷰] W2V-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training

W2V-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-TrainingMasked Language Modeling을 self-supervised speech representation learning에 적용할 수 있음W2V-BERTContrastive learning과 masked language modeling을 combine2가지의 self-supervised task를 end-to-end fashion으로 optimize논문 (ASRU 2021) : Paper Link1. IntroductionLarge-scale unannotated speech를 사용하여 Automatic..

Paper/Representation 2025. 5. 26. 17:42

이전 1 2 3 4 ··· 6 다음

이전 다음

최근에 올라온 글

최근에 달린 댓글

« 2025/05 »
일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Total

Today

Yesterday

Let IT Begin

티스토리툴바