'2025/11 글 목록

[Paper 리뷰] Language-Codec: Bridging Discrete Codec Representations and Speech Language Models

Language-Codec: Bridging Discrete Codec Representations and Speech Language ModelsDiscrete acoustic codec은 speech language model에서 intermediate representation으로 사용됨Language-CodecMasked Channel Residual Vector Quantization을 도입하여 initial codebook의 excessive information 문제를 해결추가적으로 Fourier transform structure, attention block, refined discriminator를 적용논문 (ACL 2025) : Paper Link1. IntroductionVALL-E..

Paper/Neural Codec 2025. 11. 27. 14:26

[Paper 리뷰] SimpleSpeech2: Towards Simple and Efficient Text-to-Speech with Flow-based Scalar Latent Transformer Diffusion Models

SimpleSpeech2: Towards Simple and Efficient Text-to-Speech with Flow-based Scalar Latent Transformer Diffusion ModelsNon-autoregressive Text-to-Speech model은 duration alignment로 인한 complexity가 있음SimpleSpeech2Autoregressive, Non-autoregressive approach를 combine 하여 straightforward model을 구성Simplified data preparation, fast inference, stable generation을 지원논문 (TASLP 2025) : Paper Link1. Introduction..

Paper/TTS 2025. 11. 25. 14:49

[Paper 리뷰] Efficient Speech Language Modeling via Energy Distance in Continuous Latent Space

Efficient Speech Language Modeling via Energy Distance in Continuous Latent SpaceSpeech language model은 discretization으로 인한 한계가 있음SLEDSpeech waveform을 continuous latent representation의 sequence로 encodingEnergy distance objective를 사용하여 autoregressive modeling을 수행논문 (NeurIPS 2025) : Paper Link1. IntroductionSpeech audio는 integer/floating-point range내의 value를 가지는 lengthy sampling point sequence로 re..

Paper/Language Model 2025. 11. 20. 13:50

[Paper 리뷰] Koel-TTS: Enhancing LLM based Speech Generation with Preference Alignment and Classifier Free Guidance

Koel-TTS: Enhancing LLM based Speech Generation with Preference Alignment and Classifier Free GuidanceAutoregressive speech token generation model은 hallucination과 undesired vocalization의 문제가 있음Koel-TTSPreference alignment와 Classifier Free Guidance를 활용하여 Language Model의 contextual adherence를 향상특히 speech recognition model에서 derive 된 automatic metric을 사용하여 model output을 rank 하고 conditional, uncondi..

Paper/Language Model 2025. 11. 19. 12:59

[Paper 리뷰] SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound

SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound대부분의 neural codec은 high bitrate에서 동작하고 narrow domain을 가짐SemantiCodecSpeech, general sound, music 등의 다양한 domain을 100 token/sec 이하의 token으로 compress$k$-means clustering을 통해 discretize 된 Self-Supervised Pre-Trained Audio Masked AutoEncoder와 acoustic encoder로 구성된 dual-encoder architecture를 활용논문 (JSTSP 2024) : Paper Link1. Intro..

Paper/Neural Codec 2025. 11. 18. 13:07

[Paper 리뷰] Metis: A Foundation Speech Generation Model with Masked Generative Pre-training

Metis: A Foundation Speech Generation Model with Masked Generative Pre-trainingMasked Generative Modeling을 활용하여 다양한 speech generation task에 fine-tuning 되는 speech foundation model을 구성할 수 있음MetisSelf-Supervised Learning token과 acoustic token에 대한 2가지 discrete speech representation을 활용Additional condition 없이 300K hours의 speech data에 대해 masked generative pre-training을 수행논문 (NeurIPS 2025) : Paper Link..

Paper/Representation 2025. 11. 17. 13:06

이전 1 2 다음

이전 다음

최근에 올라온 글

최근에 달린 댓글

« 2025/11 »
일	월	화	수	목	금	토
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30

Total

Today

Yesterday

Let IT Begin

티스토리툴바