'2025/03/03 글 목록

[Paper 리뷰] DiTTo-TTS: Diffusion Transformers for Scalable Text-to-Speech without Domain-Specific Factors

DiTTo-TTS: Diffusion Transformers for Scalable Text-to-Speech without Domain-Specific FactorsLarge-scale Latent Diffusion Model은 various modality에 대해 우수한 content generation 성능을 보여주고 있지만, text-to-speech에서는 phoneme, duration에 의존해야 하므로 scalability의 한계가 있음DiTTo-TTSDomain-specific factor를 제거한 Latent Diffusion Model 기반의 text-to-speech model기존 U-Net 대신 Diffusion Transformer를 채택하고 speech length predicto..

Paper/TTS 2025. 3. 3. 12:10

이전 1 다음

이전 다음

최근에 올라온 글

최근에 달린 댓글

« 2025/03 »
일	월	화	수	목	금	토
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

Total

Today

Yesterday

Let IT Begin

티스토리툴바