VoCodec: An Efficient Lightweight Low-Bitrate Speech CodecLow complexity, low latency neural codec이 필요함VoCodecVocos vocoder를 backbone으로 사용하여 complexity를 절감Speech enhancement capability를 extend 하기 위해 front end에 lightweight neural network를 cascade논문 (ICASSP 2026) : Paper Link1. IntroductionNeural codec은 encoder, decoder, quantizer module로 구성됨Encoder는 speech를 latent representation으로 compress 하고 dec..
Int-MeanFlow: Few-Step Speech Generation with Integral Velocity DistillationFlow-based model은 iterative sampling으로 인한 추론 속도의 한계가 있음Int-MeanFlowAverage velocity를 temporal interval 동안 teacher의 instantaneous velocity로 approximate추가적으로 Optimal Step Sampling Search를 도입하여 model-specific optimal sampling step을 identify논문 (ICASSP 2026) : Paper Link1. IntroductionText-to-Speech (TTS)에서 flow-based model은 ..
IPACue-TTS: Integrating Prosody and Articulatory Cues in Conditional Flow Matching for Multilingual Zero-Shot TTSNative-sounding cross-lingual, code-mixed Text-to-Speech model이 필요함IPACue-TTSPronunciation, prosodic accuracy를 향상하기 위해 articulatory phoneme refinement를 incorporateFlow-based framework를 통해 fine-grained acoustic, prosodic feature를 explicitly modeling논문 (ICASSP 2026) : Paper Link1. Intro..
IBPCodec: A Low-Bitrate Lightweight Speech Codec with Inter-Band PredictionNeural codec은 high computational complexity로 인한 한계가 있음IBPCodecInter-Band Prediction을 활용하여 low-frequency information을 modelingDecoding 시에는 high-/low-frequency band 간의 correlation을 활용하여 full speech reconstruction을 지원논문 (ICASSP 2026) : Paper Link1. IntroductionSpeech codec은 continuous waveform을 discrete representation으로 comp..
F5E-TTS: Enhancing Speech Synthesis by Aligning Text with Rich Semantic RepresentationsText-to-Speech는 text, speech 간의 semantic alignment에 대한 한계가 있음F5E-TTSPhonetic PosteriorGram의 bottleneck feature를 condition으로 Diffusion Transformer backbone을 학습Shared Vector-Quantized codebook을 사용한 explicit cross-modal regularization을 도입논문 (ICASSP 2026) : Paper Link1. IntroductionText-to-Speech (TTS)는 content co..
EmoShift: Lightweight Activation Steering for Enhanced Emotion-Aware Speech SynthesisLarge Language Model은 emotion-specific latent characteristic을 modeling 하는데 한계가 있음EmoShiftOutput embedding space에서 각 target emotion에 대한 steering vector를 학습해당 EmoSteer layer를 incorporate하여 lightweight activation-steering framework를 구성논문 (ICASSP 2026) : Paper Link1. IntroductionText-to-Speech (TTS)에서 emotion contro..
