'Paper' 카테고리의 글 목록 (7 Page)

[Paper 리뷰] PITS: Variational Pitch Inference without Fundamental Frequency for End-to-End Pitch-Controllable TTS

PITS: Variational Pitch Inference without Fundamental Frequency for End-to-End Pitch-Controllable TTSPitch-controllable text-to-seech는 fundamental frequency를 directly modeling 하는 것에 의존함PITSVariational inference를 사용하여 pitch를 modeling 하는 end-to-end modelVITS를 기반으로 Yingram encoder, Yingram decoder, adversarial training을 incorporate논문 (ICML 2023) : Paper Link1. IntroductionText-to-Speech (TTS)는 주어진 ..

Paper/TTS 2025. 6. 6. 10:08

[Paper 리뷰] Wav2Vec-C: A Self-Supervised Model for Speech Representation Learning

Wav2Vec-C: A Self-Supervised Model for Speech Representation LearningWav2Vec 2.0과 VQ-VAE를 combine 하여 representation learning을 수행할 수 있음Wav2Vec-CWav2Vec 2.0과 같이 contrastive loss를 사용하여 partially masked speech encoding에서 quantized representation을 reproduce하는 방법을 학습이때 VQ-VAE와 같이 quantized representation에서 Wav2Vec 2.0 network의 input feature를 reconstruct 하는 consistency network를 통해 quantization process를..

Paper/Representation 2025. 6. 5. 17:34

[Paper 리뷰] Wav2Vec-Switch: Contrastive Learning from Original-Noisy Speech Pairs for Robust Speech Recognition

Wav2Vec-Switch: Contrastive Learning from Original-Noisy Speech Pairs for Robust Speech RecognitionSelf-Supervised Learning framework는 noise robustness를 고려하지 않음Wav2Vec-SwitchOriginal-noisy speech pair를 Wav2Vec 2.0 network에 simultaneously feedOriginal, noisy speech에 대한 quantized representation을 서로에 대한 additional prediction target으로 활용논문 (ICASSP 2022) : Paper Link1. IntroductionSpeech task에 대한 Sel..

Paper/Representation 2025. 6. 4. 17:25

[Paper 리뷰] CSSinger: End-to-End Chunkwise Streaming Singing Voice Synthesis System based on Conditional Variational Autoencoder

CSSinger: End-to-End Chunkwise Streaming Singing Voice Synthesis System based on Conditional Variational AutoencoderEnd-to-End modeling을 singing voice synthesis에 적용하면 우수한 합성 성능을 달성할 수 있음CSSingerEnd-to-End model의 latency 절감을 위해 Chunkwise Streaming inference를 도입Variational Autoencoder의 latent representation을 활용한 fully end-to-end streaming audio synthesis를 지원논문 (AAAI 2025) : Paper Link1. Introducti..

Paper/SVS 2025. 6. 3. 08:56

[Paper 리뷰] Wav2Vec-Aug: Improved Self-Supervised Training with Limited Data

Wav2Vec-Aug: Improved Self-Supervised Training with Limited Data다양한 language에 대한 unlabeled data의 부족으로 인해 speech representation에 대한 Self-Supervised Learning은 여전히 한계가 있음Wav2Vec-AugWav2Vec 2.0 pre-training에 data augmentation을 적용Limited available data를 가지는 domain에 대해 Self-Supervised Learning을 적용논문 (INTERSPEECH 2022) : Paper Link1. IntroductionSelf-Supervised Learning (SSL)은 unlabeld speech로부터 repres..

Paper/Representation 2025. 6. 2. 17:33

[Paper 리뷰] TechSinger: Technique Controllable Multilingual Singing Voice Synthesis via Flow Matching

TechSinger: Technique Controllable Mulitlingual Singing Voice Synthesis via Flow MatchingSinging Voice Synthesis는 intensity, mixed voice, falsetto 등에 대한 precise control을 제공하지 않음TechSinger다양한 technique에 대한 expressive control을 지원하기 위해 flow-matching-based model을 도입Training data의 diversity를 향상하기 위해 phoneme-level technique lable로 dataset을 automatically annotate 하는 technique detection model을 활용Prompt-..

Paper/SVS 2025. 6. 1. 09:27

이전 1 ··· 4 5 6 7 8 9 10 ··· 71 다음

이전 다음

최근에 올라온 글

최근에 달린 댓글

« 2025/07 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Total

Today

Yesterday

Let IT Begin

티스토리툴바