DiTSinger: Scaling Singing Voice Synthesis with Diffusion Transformer and Implicit AlignmentSinging Voice Synthesis는 data scarcity와 model scalability의 한계가 있음DiTSingerFixed melody와 LLM-generated lyrics를 pairing 하여 high-quality singing dataset을 구성추가적으로 RoPE, QK-norm을 기반으로 Diffusion Transformer의 scalability를 확장하고 implicit alignment mechanism을 도입논문 (ICASSP 2026) : Paper Link1. IntroductionSinging Vo..
TCSinger2: Customizable Multilingual Zero-Shot Singing Voice Synthesis기존의 Singing Voice Synthesis는 다양한 prompt를 통한 multi-level style control이 부족함TCSinger2Blurred Boundary Content Encoder를 통해 duration을 predict 하고, content embedding을 extend 하여 smooth transition을 지원Custom Audio Encoder를 통해 singing, speech, textual prompt에서 aligned representation을 추출추가적으로 Flow-based Custom Encoder를 활용하여 style modelin..
ExpressiveSinger: Multilingual and Multi-Style Score-based Singing Voice Synthesis with Expressive Performance ControlSinging Voice Synthesis는 timing, dynamics, pitch 측면에서 controllability가 부족함ExpressiveSingerPhoneme timing, $F0$ curve, amplitude envelope를 포함하는 expressive performance control signal을 생성Style guidance와 singer timbre embedding을 활용해 performance control signal에서 mel-spectrogram을 생성논문 ..
CSSinger: End-to-End Chunkwise Streaming Singing Voice Synthesis System based on Conditional Variational AutoencoderEnd-to-End modeling을 singing voice synthesis에 적용하면 우수한 합성 성능을 달성할 수 있음CSSingerEnd-to-End model의 latency 절감을 위해 Chunkwise Streaming inference를 도입Variational Autoencoder의 latent representation을 활용한 fully end-to-end streaming audio synthesis를 지원논문 (AAAI 2025) : Paper Link1. Introducti..
TechSinger: Technique Controllable Mulitlingual Singing Voice Synthesis via Flow MatchingSinging Voice Synthesis는 intensity, mixed voice, falsetto 등에 대한 precise control을 제공하지 않음TechSinger다양한 technique에 대한 expressive control을 지원하기 위해 flow-matching-based model을 도입Training data의 diversity를 향상하기 위해 phoneme-level technique lable로 dataset을 automatically annotate 하는 technique detection model을 활용Prompt-..
Everyone-Can-Sing: Zero-Shot Singing Voice Synthesis and Conversion with Speech ReferenceCross-domain singing voice synthesis를 지원할 수 있는 unified framework가 필요함Everyone-Can-SingLyrics에 기반한 language content, musical score에 기반한 performance attribute, singing style, vocal technique 등의 multiple aspect control을 지원Pre-trained content embedding과 diffusion-based generator를 활용논문 (ICASSP 2025) : Paper Link1..
