
ConSinger: Efficient High-Fidelity Singing Voice Generation with Minimal StepsDiffusion model을 활용한 singing voice synthesis는 high-quality sample을 얻을 수 있지만 추론 속도의 한계가 있음ConSingerMimimal step 만으로 singing voice synthesis를 수행하기 위해 Consistency Model을 채택특히 training 중에 consistency constraint를 적용논문 (ICASSP 2025) : Paper Link1. IntroductionSinging Voice Synthesis (SVS)는 emotionally realistic human audio를 ..

Make-A-Voice: Revisiting Voice Large Language Models as Scalable Multilingual and Multitask LearnersVoice Large Language Model은 대부분 single task, monolingual로 제한됨Make-A-VoiceEnd-to-End local/global multiscale transformer를 활용하여 scalable learner를 구성Common knowledge를 share 하고 unseen task에 generalize 하여 in-context learning을 향상Low-resource language에 대한 data scarcity 문제를 해결하는 multilingual learner를 지원논문..