'Paper' 카테고리의 글 목록 (3 Page)

[Paper 리뷰] Differentiable Reward Optimization for LLM based TTS System

Differentiable Reward Optimization for LLM based TTS SystemNeural codec language model-based Text-to-Speech system의 성능을 개선할 수 있음DiffRONeural codec token을 기반으로 reward를 directly compute 하고 Gumbel-Softmax를 사용하여 reward function을 differentiable 하도록 구성추가적으로 Multi-Task Reward model을 도입하여 다양한 perspective에서 feedback을 제공논문 (INTERSPEECH 2025) : Paper Link1. IntroductionNeural codec token Language Modeling ..

Paper/Language Model 2025. 9. 19. 15:16

[Paper 리뷰] ParaNoise-SV: Integrated Approach for Noise-Robust Speaker Verification with Parallel Joint Learning of Speech Enhancement and Noise Extraction

ParaNoise-SV: Integrated Approach for Noise-Robust Speaker Verification with Parallel Joint Learning of Speech Enhancement and Noise Extraction 기존의 speaker verification model은 noise-robustness 측면에서 한계가 있음ParaNoise-SVNoise Extraction network와 Speech Enhancement network를 combine 한 dual U-Net을 활용Noise Extraction U-Net은 noise를 explicitly modeling 하고 Speech Enhancement U-Net은 parallel connection을 통한 ..

Paper/Verification 2025. 9. 18. 17:01

[Paper 리뷰] ZSDEVC: Zero-Shot Diffusion-based Emotional Voice Conversion with Disentangled Mechanism

ZSDEVC: Zero-Shot Diffusion-based Emotional Voice Conversion with Disentangled MechanismEmotional Voice Conversion은 emotion accuracy와 speech distortion 문제가 존재함ZSDEVCDisentangled mechanism과 expressive guidance를 가지는 diffusion framework를 활용Large emotional speech dataset으로 model을 training논문 (INTERSPEECH 2025) : Paper Link1. IntroductionEmotional Voice Conversion (EVC)는 linguistic content, speaker id..

Paper/Conversion 2025. 9. 17. 17:00

[Paper 리뷰] LSPNet: An Ultra-Low Bitrate Hybrid Neural Codec

LSPNet: An Ultra-Low Bitrate Hybrid Neural CodecUltra-low bitrate에서도 동작할 수 있는 neural codec이 필요함LSPNetLPCNet framework를 기반으로 parameteric encoder를 combine 하여 Line Spectral Pair를 incorporate추가적으로 STFT loss와 Cross-Entropy loss를 활용한 Joint Time-Frequency training strategy를 적용논문 (INTERSPEECH 2025) : Paper Link1. Introduction1.2kbps의 ultra-low bitrate speech coding에서 intelligible, natural-sounding speec..

Paper/Neural Codec 2025. 9. 16. 17:00

[Paper 리뷰] EmotionRankCLAP: Bridging Natural Language Speaking Styles and Ordinal Speech Emotion via Rank-N-Contrast

EmotionRankCLAP: Bridging Natural Language Speaking Styles and Ordinal Speech Emotion via Rank-N-ContrastContrastive Language Audio Pre-training은 emotion의 ordinal nature를 capture 하지 못하고 audio, text embedding 간의 insufficient alignment가 나타남EmotionRankCLAPEmotional speech와 natural language prompt의 dimensional attribute를 활용하여 fine-grained emotion variation을 jointly captureRank-N-Contrast objective를 ..

Paper/Representation 2025. 9. 15. 17:02

[Paper 리뷰] ControlSpeech: Towards Simultaneous and Independent Zero-Shot Speaker Cloning and Zero-Shot Language Style Control

ControlSpeech: Towards Simultaneous and Independent Zero-Shot Speaker Cloning and Zero-Shot Language Style ControlSpeaking style control과 adjustment를 위한 Text-to-Speech model이 필요함ControlSpeechSpeech prompt, content prompt, style prompt를 input으로 하여 bidirectional attention, mask-based parallel decoding을 통해 codec representation을 captureStyle Mixture Semantic Density module을 통해 textual style control의..

Paper/TTS 2025. 9. 14. 08:40

이전 1 2 3 4 5 6 ··· 80 다음

이전 다음

최근에 올라온 글

최근에 달린 댓글

« 2025/11 »
일	월	화	수	목	금	토
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30

Total

Today

Yesterday

Let IT Begin

티스토리툴바