Let IT Begin

[Paper 리뷰] Generative De-quantization for Neural Speech Codec via Latent Diffusion

Generative De-quantization for Neural Speech Codec via Latent DiffusionLow-bitrate speech coding에서 end-to-end network는 compact, expressive feature와 powerful decoder를 학습하는 것을 목표로 함- BUT, 여전히 complexity와 speech quality 측면에서 한계가 있음LaDiffCodecLow-dimensional discrete token을 학습하기 위해 end-to-end codec을 구성Latent diffusion model을 사용하여 coded feature를 high-dimensional continuous space로 de-quantize추가적으로 ove..

Paper/Neural Codec 2024. 7. 18. 10:07

[Paper 리뷰] PeriodSinger: Integrating Periodic and Aperiodic Variational Autoencoders for Natural-Sounding End-to-End Singing Voice Synthesis

PeriodSinger: Integrating Periodic and Aperiodic Variational Autoencoders for Natural-Sounding End-to-End Singing Voice Synthesis자연스러운 waveform을 합성하기 위해서는 deterministic pitch conditioning으로 인한 one-to-many 문제를 해결해야 함PeriodSingerPeriodic/aperiodic component에 대한 variational autoencoder를 활용Note boundary 내에서 monotonic alignment search를 통해 phoneme alignment를 추정함으로써 external aligner에 대한 의존성을 제거논문 (INTE..

Paper/SVS 2024. 7. 17. 09:59

[Paper 리뷰] TacoLM: Gated Attention Equipped Codec Language Model are Efficient Zero-shot Text-to-Speech Synthesizers

TacoLM: Gated Attention Equipped Codec Language Model are Efficient Zero-shot Text to Speech SynthesizersNeual codec language model은 zero-shot text-to-speech에서 우수한 성능을 보이고 있음BUT, autoregressive nature와 text-audio 간의 implicit alignment로 인해 속도의 한계가 있음TacoLMTraining/inference 속도를 향상하고 model size를 줄이기 위해 gated attention mechanism을 도입추가적으로 각 decoder layer마다 gated cross-attention layer를 적용하여 합성 품질과 ef..

Paper/Language Model 2024. 7. 16. 10:40

[Paper 리뷰] MakeSinger: A Semi-Supervised Training Method for Data-Efficient Singing Voice Synthesis via Classifier-free Diffusion Guidance

MakeSinger: A Semi-Supervised Training Method for Data-Efficient Singing Voice Synthesis via Classifier-free Diffusion GuidanceSinging voice synthesis를 위해 semi-supervised training을 활용할 수 있음MakeSingerLabeling에 관계없이 모든 speech, singing voice data에서 diffusion-based model을 trainingDual guiding mechanism을 통해 maske input의 score를 추정하여 reverse diffusion step에 대한 text/pitch guidance를 제공논문 (INTERSPEECH 202..

Paper/SVS 2024. 7. 15. 09:30

[Paper 리뷰] Bunched LPCNet: Vocoder for Low-cost Neural Text-to-Speech Systems

Bunched LPCNet: Vocoder for Low-cost Neural Text-to-Speech SystemsLPCNet은 linear prediction과 neural network를 결합하여 computational complexity를 크게 낮출 수 있음Bunched LPCNetLPCNet이 추론 당 둘 이상의 audio sample을 생성하도록 하는 sample-bunchingLPCNet final layer에서 computation을 줄이는 bit-bunching을 도입논문 (INTERSPEECH 2020) : Paper Link1. IntroductionLPCNet은 추론 속도와 합성 품질 측면에서 뛰어난 성능을 달성함특히 source-filter model을 기반으로 low-cost..

Paper/Vocoder 2024. 7. 14. 10:29

[Paper 리뷰] End-to-End LPCNet: A Neural Vocoder with Fully-Differentiable LPC Estimation

End-to-End LPCNet: A Neural Vocoder with Fully-Differentiable LPC EstimationNeural vocoder는 여전히 우수한 합성 품질에 비해 높은 computational complexity가 요구됨End-to-End LPCNetLinear prediction에 기반한 autoregressive model을 사용하여 neural vocoding의 complexity를 완화추가적으로 frame rate network의 input feature에서 linear prediction cofficient를 예측하는 방법을 학습하여 기존 end-to-end version을 구성논문 (INTERSPEECH 2022) : Paper Link1. Introducti..

Paper/Vocoder 2024. 7. 13. 11:00

[Paper 리뷰] SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar Latent Transformer Diffusion Models

SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar Latent Transformer Diffusion ModelsDiffusion 기반의 non-autoregressive text-to-speech 모델은 높은 효율성이 요구됨SimpleSpeechScalar quantization을 수행하는 speech codec인 SQ-Codec을 활용- Complex speech signal을 finite, compact scalar latent space로 mapping 하는 역할이후 SQ-Codec의 scalar latent space에 transformer diffusion model을 적용논문 (INTERSPEECH 2024) : Pa..

Paper/TTS 2024. 7. 12. 09:35

[Paper 리뷰] TokSing: Singing Voice Synthesis based on Discrete Tokens

TokSing: Singing Voice Synthesis based on Discrete TokensSelf-supervised learning model에서 추출된 discrete token을 활용하여 singing voice synthesis의 성능을 향상할 수 있음TokSingFlexible token blending을 제공하는 token formulator를 갖춘 discrete-based singing voice synthesis modelMelody signal을 discrete token과 integrate 하고 musical encoder에 melody enhancement strategy를 도입논문 (INTERSPEECH 2024) : Paper Link1. IntroductionSin..

Paper/SVS 2024. 7. 11. 09:09

[Paper 리뷰] Light-TTS: Lightweight Multi-Speaker Multi-Lingual Text-to-Speech

Light-TTS: Lightweight Multi-Speaker Multi-Lingual Text-to-SpeechText-to-Speech 모델은 대부분 attention-based autoregressive model이므로 합성 속도가 느리고 model parameter가 크다는 한계가 있음Light-TTSNon-autoregressive model을 기반으로 빠른 음성 합성을 지원다양한 language의 code-switch가 가능한 multi-lingual system을 구성논문 (ICASSP 2021) : Paper Link1. Introduction일반적으로 Text-to-Speech (TTS) 모델은 text-speech alignment를 학습하기 위해 attention mechanism..

Paper/TTS 2024. 7. 10. 09:43

이전 1 ··· 29 30 31 32 33 34 35 ··· 65 다음

이전 다음

최근에 올라온 글

최근에 달린 댓글

« 2025/12 »
일	월	화	수	목	금	토
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Total

Today

Yesterday

티스토리툴바