'2025/06 글 목록

[Paper 리뷰] OZSpeech: One-Step Zero-Shot Speech Synthesis with Learned-Prior-Conditioned Flow Matching

OZSpeech: One-Step Zero-Shot Speech Synthesis with Learned-Prior-Conditioned Flow MatchingWaveform, spectrogram과 같은 기존의 speech representation은 speech attribute를 overlooking 하고 high computational cost를 가짐OZSpeechOne-step sampling과 learned prior를 condition으로 사용하여 sampling step 수를 reduceToken format의 disentangled, factorized component를 활용하여 speech attributre를 modeling논문 (ACL 2025) : Paper Link1. In..

Paper/TTS 2025. 6. 30. 17:03

[Paper 리뷰] DiTAR: Diffusion Transformer Autoregressive Modeling for Speech Generation

DiTAR: Diffusion Transformer Autoregressive Modeling for Speech GenerationDiffusion model과 autoregressive model을 결합하면 computational load와 suboptimal outcome이 발생함DiTARPatch generation을 위해 divide-and-conquer strategy를 도입Langauge model은 aggregated patch embedding을 처리한 다음, diffusion Transformer를 통해 next patch를 subsequently generate추론 시에는 reverse diffusion ODE 중 noise introducing time point를 temperat..

Paper/Language Model 2025. 6. 29. 09:05

[Paper 리뷰] BEATs: Audio Pre-Training with Acoustic Tokenizers

BEATs: Audio Pre-Training with Acoustic TokenizersGeneral audio representation pre-training을 위헌 Self-Supervised Learning framework가 필요함BEATsSemantic-rich acoustic tokenizer에서 얻어지는 label에 대한 discrete label prediction task를 활용Tokenizer와 pre-trained model에 대한 iterative pipeline을 구성논문 (ICML 2023) : Paper Link1. IntroductionWav2Vec 2.0, HuBERT, WavLM, Data2Vec 등의 speech Self-Supervised Learning (SSL)..

Paper/Representation 2025. 6. 28. 08:38

[Paper 리뷰] TCSinger2: Customizable Multilingual Zero-Shot Singing Voice Synthesis

TCSinger2: Customizable Multilingual Zero-Shot Singing Voice Synthesis기존의 Singing Voice Synthesis는 다양한 prompt를 통한 multi-level style control이 부족함TCSinger2Blurred Boundary Content Encoder를 통해 duration을 predict 하고, content embedding을 extend 하여 smooth transition을 지원Custom Audio Encoder를 통해 singing, speech, textual prompt에서 aligned representation을 추출추가적으로 Flow-based Custom Encoder를 활용하여 style modelin..

Paper/SVS 2025. 6. 27. 12:47

[Paper 리뷰] E3-TTS: Easy End-to-End Diffusion-based Text to Speech

E3-TTS: Easy End-to-End Diffusion-based Text to SpeechEnd-to-End diffusion-based Text-to-Speech model을 활용하여 high-fidelity speech를 얻을 수 있음E3-TTSPlain text를 input으로 하여 iterative refinement process를 통해 waveform을 생성특히 spectrogram feature, alignment information과 같은 intermediate representation에 의존하지 않음논문 (ASRU 2023) : Paper Link1. IntroductionWaveGrad, DiffWave 등과 같이 Text-to-Speech (TTS) system에 Diffu..

Paper/TTS 2025. 6. 26. 17:02

[Paper 리뷰] E2-TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS

E2-TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTSHigh speaker similarity, intelligibility를 가지는 zero-shot Text-to-Speech model이 필요함E2-TTSText input을 filler token을 가지는 character sequence로 convert 하여 사용Flow-Matching-based mel-spectrogram generator를 audio infilling task를 기반으로 training 하고 duration model과 같은 additional component에 대한 의존성을 제거논문 (SLT 2024) : Paper Link1. IntroductionVALL..

Paper/TTS 2025. 6. 25. 17:06

이전 1 2 3 4 ··· 6 다음

이전 다음

최근에 올라온 글

최근에 달린 댓글

« 2025/06 »
일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Total

Today

Yesterday

Let IT Begin

티스토리툴바