'2025/04 글 목록

[Paper 리뷰] ATP-TTS: Adaptive Thresholding Pseudo-Labeling for Low-Resource Multi-Speaker Text-to-Speech

ATP-TTS: Adaptive Thresholding Pseudo-Labeling for Low-Resource Multi-Speaker Text-to-SpeechText-to-Speech는 low-resource scenario에서는 활용하기 어려움ATP-TTSAdaptive Thresholding을 통해 적절한 pseudo-label을 select이후 contrastive learning perturbation으로 enhance 된 Automatic Speech Recognition model을 활용하여 latent representation을 predict논문 (ICASSP 2025) : Paper Link1. IntroductionGlow-TTS, VITS, NaturalSpeech와 같은 su..

Paper/TTS 2025. 4. 30. 17:50

[Paper 리뷰] SSR-Speech: Towards Stable, Safe and Robust Zero-Shot Text-based Speech Editing and Synthesis

SSR-Speech: Towards Stable, Safe and Robust Zero-Shot Text-based Speech Editing and SynthesisStable, safe, robust zero-shot text-to-speech model이 필요함SSR-SpeechTransformer decoder를 기반으로 classifier-free guidance를 incorporateWatermark EnCodec을 통해 edited region에 대한 frame-level watermark를 embed논문 (ICASSP 2025) : Paper Link1. IntroductionYourTTS와 같은 zero-shot text-based speech generation model은 Speech..

Paper/TTS 2025. 4. 29. 17:48

[Paper 리뷰] Adapting Whisper for Code-Switching through Encoding Refining and Language-Aware Decoding

Adapting Whisper for Code-Switching through Encoding Refining and Language-Aware DecodingCode-Switching Automatic Speech Recognition은 여전히 seamless language switch 측면에서 한계가 있음CS-WhisperWhisper를 기반으로 encoder의 intra-sentence switching을 향상하기 위해 Encoder Refiner를 도입각 decoder layer에서 language-specific decoding information을 얻기 위해 서로 다른 language prompt를 가진 Language-Aware Adapter를 활용논문 (ICASSP 2025) : Pap..

Paper/ASR 2025. 4. 28. 17:51

[Paper 리뷰] SpeechFlow: Generative Pre-Training for Speech with Flow Matching

SpeechFlow: Generative Pre-Training for Speech with Flow MatchingSingle pre-trained generative model을 다양한 downstream task에 활용할 수 있음SpeechFlowFlow Matching과 masked condition을 사용하여 untranscribed speech로 pre-training을 수행Pre-trained generative model을 task-specific data로 fine-tuning 하여 다양한 task에 적용논문 (ICLR 2024) : Paper Link1. IntroductionDiscriminative model은 speech recognition, enhancement, separat..

Paper/Representation 2025. 4. 27. 08:48

[Paper 리뷰] From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion

From Discrete Tokens to High-Fidelity Audio Using Multi-Band DiffusionDiffusion을 highly compressed representation으로 condition 된 audio waveform을 합성하는 데 사용할 수 있음MBDLow-bitrate discrete representation에서 any type audio modality를 생성이를 위해 Multi-band diffusion-based framework를 활용논문 (NeurIPS 2023) : Paper Link1. IntroductionMelGAN과 같은 neural-based vocoder는 high-quality sample을 합성할 수 있음특히 HuBERT와 같은 Self..

Paper/Neural Codec 2025. 4. 26. 10:18

[Paper 리뷰] VQ-Wav2Vec: Self-Supervised Learning of Discrete Speech Representations

VQ-Wav2Vec: Self-Supervised Learning of Discrete Speech RepresentationsWav2Vec-style self-supervised context prediction을 통해 audio segment의 discrete representation을 학습할 수 있음VQ-Wav2VecGumbel-Softmax, online $k$-means clusetering을 활용하여 dense representation을 quantizeDiscretization을 통해 BERT pre-training을 directly applicate논문 (ICLR 2020) : Paper Link1. IntroductionDiscrete speech representation을 학습하기 ..

Paper/Representation 2025. 4. 25. 17:53

이전 1 2 3 4 5 다음

이전 다음

최근에 올라온 글

최근에 달린 댓글

« 2025/04 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Total

Today

Yesterday

Let IT Begin

티스토리툴바