'분류 전체보기' 카테고리의 글 목록 (6 Page)

[Paper 리뷰] CosyVoice: A Scalable Multilingual Zero-Shot Text-to-Speech Synthesizer based on Supervised Semantic Tokens

CosyVoice: A Scalable Multilingual Zero-Shot Text-to-Speech Synthesizer based on Supervised Semantic TokensLarge Language Model-based Text-to-Speech에서 speech token은 unsupervised manner로 학습됨- 즉, explict semantic information, text alignment information이 부족함CosyVoiceEncoder에 vector quantization을 inserting 하여 multilingual speech recognition model에서 derive 된 supervised semantic token을 활용해당 token을 기반으..

Paper/Language Model 2025. 3. 16. 10:41

[Paper 리뷰] HALL-E: Hierarchical Neural Codec Language Model for Minute-Long Zero-Shot Text-to-Speech Synthesis

HALL-E: Hierarchical Neural Codec Language Model for Minute-Long Zero-Shot Text-to-Speech SynthesisDiscrete audio token을 활용하는 Large Language Model 기반의 text-to-speech model은 high frame rate로 인해 long-form speech synthesis가 어려움HALL-EMulti-Resolution Requantization을 도입해 neural audio codec의 frame rate를 절감- 이때 teacher-student distillation으로 discrete audio token을 reorganize 하는 Multi-Resolution Residual..

Paper/Language Model 2025. 3. 15. 14:04

[Paper 리뷰] RFWave: Multi-Band Rectified Flow for Audio Waveform Reconstruction

RFWave: Multi-Band Rectified Flow for Audio Waveform ReconstructionDiffusion model은 waveform reconstruction에 효과적이지만 상당한 sampling step이 필요하므로 latency 문제가 존재함RFWaveComplex spectrogram을 생성하고 frame-level에서 모든 subband를 simultaneously process 함Straight transport trajectory를 위해 Rectified Flow를 도입논문 (ICLR 2025) : Paper Link1. IntroductionAudio waveform reconstruction은 raw audio data에서 derive 된 low-dimen..

Paper/Vocoder 2025. 3. 9. 12:24

[Paper 리뷰] PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform Generation

PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform GenerationHigh-resolution waveform signal의 natural periodic feature를 explicitly disentangle 할 수 있는 generator가 필요함PeriodWaveVector field를 추정할 때 waveform signal의 periodic feature를 capture 하는 period-aware flow matching estimator를 도입Waveform signal의 periodic feature를 capture 하는 multi-period estimator를 활용추가적으로 waveform generation에서 hig..

Paper/Vocoder 2025. 3. 8. 12:24

[Paper 리뷰] DiTTo-TTS: Diffusion Transformers for Scalable Text-to-Speech without Domain-Specific Factors

DiTTo-TTS: Diffusion Transformers for Scalable Text-to-Speech without Domain-Specific FactorsLarge-scale Latent Diffusion Model은 various modality에 대해 우수한 content generation 성능을 보여주고 있지만, text-to-speech에서는 phoneme, duration에 의존해야 하므로 scalability의 한계가 있음DiTTo-TTSDomain-specific factor를 제거한 Latent Diffusion Model 기반의 text-to-speech model기존 U-Net 대신 Diffusion Transformer를 채택하고 speech length predicto..

Paper/TTS 2025. 3. 3. 12:10

[Paper 리뷰] UniAudio: Towards Universal Audio Generation with Large Language Models

UniAudio: Towards Universal Audio Generation with Large Language Models다양한 task를 unified manner로 처리할 수 있는 universal audio generation model이 필요함UniAudioLarge Language Model-based audio generation model을 구성해 phoneme, text description, audio 등의 다양한 input condition을 기반으로 speech, sound, music, singing voice 등을 생성Model performance와 efficiency를 향상하기 위한 audio tokenization과 language model architecture를 설..

Paper/Language Model 2025. 3. 2. 12:54

이전 1 ··· 3 4 5 6 7 8 9 ··· 70 다음

이전 다음

최근에 올라온 글

최근에 달린 댓글

« 2025/04 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Total

Today

Yesterday

Let IT Begin

티스토리툴바