'Paper' 카테고리의 글 목록 (27 Page)

[Paper 리뷰] SoundStorm: Efficient Parallel Audio Generation

SoundStorm: Efficient Parallel Audio GenerationEfficient, non-autoregressive audio generation을 위한 neural codec이 필요함SoundStormAudioLM의 semantic token을 input으로 receive 하고 bidrectional attention과 confidence-based parallel decoding을 사용하여 neural audio codec token을 생성Autoregressive 방식과 비교하여 2배의 속도 향상 효과와 고품질의 audio 합성이 가능논문 (Google Research 2023) : Paper Link1. IntroductionNeural codec을 통해 생성된 audio의 ..

Paper/Neural Codec 2024. 4. 26. 09:49

[Paper 리뷰] Matcha-TTS: A Fast TTS Architecture with Conditional Flow Matching

Matcha-TTS: A Fast TTS Architecture with Conditional Flow MatchingOptimal-transport conditional flow matching을 사용하여 text-to-speech에서의 acoustic modeling 속도를 향상할 수 있음Matcha-TTS Optimal-transport conditional flow matching을 기반으로 기존의 score matching 방식보다 더 적은 step으로 고품질의 output을 제공하는 ODE-based decoder를 얻음Probabilistic, non-autregressive 하게 동작하고 external alignment 없이 scratch로 학습 가능논문 (ICASSP 2024) : Pa..

Paper/TTS 2024. 4. 25. 10:30

[Paper 리뷰] Mels-TTS: Multi-Emotion Multi-Lingual Multi-Speaker Text-to-Speech System via Disentangled Style Tokens

Mels-TTS: Multi-Emotion Multi-Lingual Multi-Speaker Text-to-Speech System via Disentangled Style Tokens효과적인 emotion transfer를 위해 disentangled style token을 활용할 수 있음Mels-TTSGlobal style token에서 영감을 받아 emotion, language, speaker, residual information을 disentangle 하는 개별적인 style token을 활용Attention mechanism을 적용하여 각 style token에서 target speech에 대한 speech attribute를 학습논문 (ICASSP 2024) : ..

Paper/TTS 2024. 4. 24. 10:21

[Paper 리뷰] MM-TTS: Multi-Modal Prompt Based Style Transfer for Expressive Text-to-Speech Synthesis

MM-TTS: Multi-Modal Prompt Based Style Transfer for Expressive Text-to-Speech SynthesisText-to-Speech에서 style transfer는 style information을 text context에 반영하여 특정 style을 가진 음성을 생성하는 것을 목표로 함BUT, 기존의 style transfer 방식들은 fixed emotional label이나 reference clip에 의존하므로 flexible 한 style transfer의 한계가 있음MM-TTS생성되는 음성의 style을 control 하기 위해 reference speech, emotional facial image, text description 등을 포함하는..

Paper/TTS 2024. 4. 23. 09:45

[Paper 리뷰] LangWave: Realistic Voice Generation based on High-Order Langevin Dynamics

LangWave: Realistic Voice Generation based on High-Order Langevin DynamicsDiffusion model은 음성 생성에서 우수한 성능을 보이고 있지만 대부분 first-order stochastic differential equation이나 equivalent diffusion model에 의존함LangWave기존의 first-order method에서 벗어나 third-order Langevin dynamical system을 활용하여 waveform을 생성Ambient Euclidean space에서 voice wave diffusion, position, velocity, acceleration을 동시에 모델링하여 white noise에서 wa..

Paper/Vocoder 2024. 4. 22. 10:51

[Paper 리뷰] SoundStream: An End-to-End Neural Audio Codec

SoundStream: An End-to-End Neural Audio CodecSpeech-tailored codec이 목표로 하는 bitrate로 음성, 음악, general audio를 효율적으로 compress 할 수 있도록 neural audio codec이 필요함SoundStreamFully-convolutional encoder/decoder와 residual vector quantizer로 구성된 architecture를 활용하여 end-to-end 방식으로 training 됨Training 시에는 adversarial loss와 reconstruction loss를 결합하여 quantized embedding에서 고품질 audio를 생성할 수 있도록 함Quantizer layer에 str..

Paper/Neural Codec 2024. 4. 21. 13:45

이전 1 ··· 24 25 26 27 28 29 30 ··· 49 다음

이전 다음

최근에 올라온 글

최근에 달린 댓글

« 2025/02 »
일	월	화	수	목	금	토
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28

Total

Today

Yesterday

Let IT Begin

티스토리툴바