'Paper' 카테고리의 글 목록 (4 Page)

[Paper 리뷰] Cross-Lingual F5-TTS: Towards Language-Agnostic Voice Cloning and Speech Synthesis

Cross-Lingual F5-TTS: Towards Language-Agnostic Voice Cloning and Speech SynthesisFlow-matching-based Text-to-Speech model은 cross-lingual task에 적용하기 어려움Cross-Lingual F5-TTSForced alignment를 활용하여 audio prompt를 pre-process 해 word boundary를 얻어 audio prompt로부터 direct synthesis를 수행Duration modeling을 위해 다양한 linguistic granularity를 가지는 speaking rate predictor를 도입논문 (ICASSP 2026) : Paper Link1. Introduc..

Paper/TTS 2026. 3. 25. 12:54

[Paper 리뷰] SUNAC: Source-Aware Unified Neural Audio Codec

SUNAC: Source-Aware Unified Neural Audio CodecNeural Audio Codec은 multiple source mixture를 entangled manner로 encode 하므로 특정 source의 subset에 access 하는 downstream processing에는 부적합할 수 있음SUNACSource type prompt에 condition되어 mixture에서 individual source를 encodeSource-aware codec을 통해 user-driven selection과 separate encoding을 지원논문 (ICASSP 2026) : Paper Link1. IntroductionNeural Audio Codec (NAC)는 audio s..

Paper/Neural Codec 2026. 3. 24. 13:40

[Paper 리뷰] VoXtream: Full-Stream Text-to-Speech with Extremely Low Latency

VoXtream: Full-Stream Text-to-Speech with Extremely Low LatencyReal-time zero-shot streaming text-to-speech model이 필요함VoXtreamLimited look-ahead를 사용하여 incoming phoneme을 audio token으로 directly mapping구조적으로는 incremental phoneme transformer, temporal transformer, depth transformer를 활용논문 (ICASSP 2026) : Paper Link1. IntroductionLow-latency streaming Text-to-Speech (TTS)를 위해서는 first-packet latency를 m..

Paper/TTS 2026. 3. 23. 10:23

[Paper 리뷰] MeanVoiceFlow: One-Step Nonparallel Voice Conversion with Mean Flows

MeanVoiceFlow: One-Step Nonparallel Voice Conversion with Mean FlowsVoice Conversion에서 flow-matching model은 iterative inference로 인한 한계가 있음MeanVoiceFlowMean Flow를 기반으로 pre-training, distillation 없이 one-step non-parallel conversion을 지원추가적으로 structural margin reconstruction loss, zero-input constraint를 도입하여 model의 input-output behavior를 regularize논문 (ICASSP 2026) : Paper Link1. IntroductionVoice Co..

Paper/Conversion 2026. 3. 20. 13:41

[Paper 리뷰] CodecSlime: Temporal Redundancy Compression of Neural Speech Codec via Dynamic Frame Rate

CodecSlime: Temporal Redundancy Compression of Neural Speech Codec via Dynmaic Frame Rate대부분의 neural codec은 fixed-frame rate에서 동작하므로 temporal mismatch가 존재함CodecSlimeSchedulable Dynamic Frame Rate를 활용하여 neural codec에서 temporal redundancy를 compressMelt-and-Cool training을 도입해 adaptation을 향상논문 (ICASSP 2026) : Paper Link1. IntroductionNeural speech codec은 lowest achievable frame rate에서 best possible ..

Paper/Neural Codec 2026. 3. 19. 12:53

[Paper 리뷰] DAIEN-TTS: Disentangled Audio Infilling for Environment-Aware Text-to-Speech Synthesis

DAIEN-TTS: Disentangled Audio Infilling for Environment-Aware Text-to-Speech SynthesisEnvironment-aware Text-to-Speech를 위한 model이 필요함DAIEN-TTSPre-trained Speech-Environment Separation module을 활용하여 clean speech, environment audio의 mel-spectrogram을 추출하고, random span mask를 각 mel-spectrogram에 적용하여 infilling process를 지원Speech, environment component에 Dual Classifier-Free Guidance를 적용하여 controllability..

Paper/TTS 2026. 3. 18. 12:54

이전 1 2 3 4 5 6 7 ··· 91 다음

이전 다음

최근에 올라온 글

최근에 달린 댓글

« 2026/05 »
일	월	화	수	목	금	토
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

Total

Today

Yesterday

Let IT Begin

티스토리툴바