'Paper/Language Model' 카테고리의 글 목록

[Paper 리뷰] VALL-E2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers

VALL-E2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers기존의 VALL-E를 추가적으로 개선할 수 있음VALL-E2Repetition Aware Sampling을 통해 기존 nucleus sampling process를 향상Grouped Code Modeling을 통해 inference speed와 long sequence modeling을 개선논문 (Microsoft 2025) : Paper Link1. IntroductionText-to-Speech (TTS)는 text input으로부터 high clarity, intelligibility를 가진 high-quality speech를 생성하는..

Paper/Language Model 2025. 8. 3. 10:13

[Paper 리뷰] CosyVoice3: Towards In-the-Wild Speech Generation via Scaling-up and Post-Training

CosyVoice3: Towards In-the-Wild Speech Generation via Scaling-up and Post-Training앞선 CosyVoice2는 language coverage, domain diversity, data volume 측면에서 한계가 있음CosyVoice3Supervised multi-task training에 기반한 speech tokenizer를 도입Differentiable reward model을 위한 post-training을 적용Data size, model size scaling을 통해 다양한 domain과 text format을 지원논문 (Alibaba 2025) : Paper Link1. IntroductionZero-shot Text-to-Sp..

Paper/Language Model 2025. 7. 27. 09:00

[Paper 리뷰] CosyVoice2: Scalable Streaming Speech Synthesis with Large Language Models

CosyVoice2: Scalable Streaming Speech Synthesis with Large Language Models기존 CosyVoice를 추가적으로 개선할 수 있음CosyVoice2Speech token의 codebook utilization을 향상하는 finite-scalar quantization을 도입Pre-trained large language model을 backbone으로 사용할 수 있도록 architecture를 streamline 하고 chunk-aware causal flow matching model을 통해 streaming/non-streaming synthesis를 지원논문 (Alibaba 2024) : Paper Link1. IntroductionZero-sh..

Paper/Language Model 2025. 7. 26. 11:38

[Paper 리뷰] MELLE: Autoregressive Speech Synthesis without Vector Quantization

MELLE: Autoregressive Speech Synthesis without Vector QuantizationText-to-Speech를 위해 continuous-valued token based language modeling을 활용할 수 있음MELLESpectrogram Flux loss를 사용하여 continuous-valued token distribution을 modelingVariational inference를 incorporate 하여 diversity, robustness를 향상논문 (ACL 2025) : Paper Link1. IntroductionNext-token prediction은 previous token을 condition으로 하여 next discrete token..

Paper/Language Model 2025. 7. 2. 17:05

[Paper 리뷰] DiTAR: Diffusion Transformer Autoregressive Modeling for Speech Generation

DiTAR: Diffusion Transformer Autoregressive Modeling for Speech GenerationDiffusion model과 autoregressive model을 결합하면 computational load와 suboptimal outcome이 발생함DiTARPatch generation을 위해 divide-and-conquer strategy를 도입Langauge model은 aggregated patch embedding을 처리한 다음, diffusion Transformer를 통해 next patch를 subsequently generate추론 시에는 reverse diffusion ODE 중 noise introducing time point를 temperat..

Paper/Language Model 2025. 6. 29. 09:05

[Paper 리뷰] ELLA-V: Stable Neural Codec Language Modeling with Alignment-Guided Sequence Recording

ELLA-V: Stable Neural Codec Language Modeling with Alignment-Guided Sequence RecordingAcoustic, linguistic prompt에 기반한 language model은 zero-shot audio synthesis에서 우수한 성능을 보임ELLA-VPhoneme level에서 synthesized audio에 대한 fine-grained control을 지원Acoustic token ahead에 phoneme token이 appear 할 때 acoustic, phoneme token sequence를 interleaving논문 (AAAI 2025) : Paper Link1. IntroductionZero-shot Text-to-Spe..

Paper/Language Model 2025. 5. 25. 09:06

이전 1 2 3 4 5 다음

이전 다음

최근에 올라온 글

최근에 달린 댓글

« 2025/08 »
일	월	화	수	목	금	토
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

Total

Today

Yesterday

Let IT Begin

티스토리툴바