CoCoEmo: Composable and Controllable Human-Like Emotional TTS via Activation Steering대부분의 text-to-speech system은 single utterance-level emotion을 enforce 함CoCoEmoActivation steering에 대한 multi-rater evaluation protocol을 도입Human-like emotional speech를 위한 lightweight steering approach를 적용논문 (ICML 2026) : Paper Link1. IntroductionNatural speech는 inherently complex 하고 multiple concurrent, conflicting ..
FC-TTS: Style and Timbre Control in Zero-Shot Text-to-Speech with Disentangled Speech RepresentationsZero-shot Text-to-Speech는 여전히 independent, precise control 측면에서 한계가 있음FC-TTS2-stage spectrogram generation pipeline과 VQ-VAE-based style encoder를 도입 추가적으로 conditioning-aware consistency loss를 도입해 attribute separation과 dual-reference control의 reliability를 향상논문 (ACL 2026) : Paper Link1. Introduction..
EuleroDec: A Complex-Valued RVQ-VAE for Efficient and Robust Audio CodingSpectrogram-domain은 complex-valued phase modeling의 한계가 있음EuleroDecAnalysis-Quantization-Synthesis pipeline에서 magnitude-phase coupling을 preserve특히 adversarial discriminator, diffusion post-filter를 제거하여 end-to-end processing을 지원논문 (ICASSP 2026) : Paper Link1. IntroductionSpectral-domain audio codec은 STFT를 통해 signal을 time-freq..
SyncSpeech: Efficient and Low-Latency Text-to-Speech based on Temporal Masked TransformerText-to-Speech model은 여전히 latency의 한계가 있음SyncSpeechTemporal Mask Transformer를 기반으로 autoregressive model의 temporally ordered generation과 non-autoregressive model의 parallel decoding을 unify추가적으로 High-Probability Masking을 통해 training efficiency를 향상논문 (ICASSP 2026) : Paper Link1. IntroductionText-to-Speech (TTS)는..
VoCodec: An Efficient Lightweight Low-Bitrate Speech CodecLow complexity, low latency neural codec이 필요함VoCodecVocos vocoder를 backbone으로 사용하여 complexity를 절감Speech enhancement capability를 extend 하기 위해 front end에 lightweight neural network를 cascade논문 (ICASSP 2026) : Paper Link1. IntroductionNeural codec은 encoder, decoder, quantizer module로 구성됨Encoder는 speech를 latent representation으로 compress 하고 dec..
Int-MeanFlow: Few-Step Speech Generation with Integral Velocity DistillationFlow-based model은 iterative sampling으로 인한 추론 속도의 한계가 있음Int-MeanFlowAverage velocity를 temporal interval 동안 teacher의 instantaneous velocity로 approximate추가적으로 Optimal Step Sampling Search를 도입하여 model-specific optimal sampling step을 identify논문 (ICASSP 2026) : Paper Link1. IntroductionText-to-Speech (TTS)에서 flow-based model은 ..