[Paper 리뷰] UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation

티스토리 뷰

Paper/Vocoder

[Paper 리뷰] UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation

feVeRin 2024. 3. 22. 10:12

UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation

Full-band spectral feature를 사용하면 vocoder에 많은 acoustic information을 제공할 수 있음
- BUT, full-band mel-spectrogram 사용 시 over-smoothing 문제가 발생할 수 있음
UnivNet
- Full-band over-smoothing 문제를 해결하는 고품질 neural vocoder
- Multiple linear spectrogram magnitude를 사용하는 multi-resolution spectrogram discriminator를 도입
논문 (INTERSPEECH 2021) : Paper Link

1. Introduction

Generative Adversarial Network (GAN)을 neural vocoder에 적용하면 빠르고, high-fidelity의 음성을 얻을 수 있음
- 일반적으로 neural vocoder는 mel-spectrogram을 사용하여 waveform을 생성함
  - 이때 high-frequency band의 acoustic information이 모델에 제공되지 않음
- 한편으로 sampling rate의 절반에 해당하는 spectral feature를 input으로 사용하여 full-band acoustic information을 제공할 수 있음
  - BUT, full-band mel-spectrogram을 사용하면 non-sharp spectrogram으로 인해 over-smoothing이 발생함
- 이때 GAN의 discriminator를 통해 해당 문제를 해결할 수 있음
  - Discriminator가 temporal feature 뿐만 아니라 multiple resolution spectral feature를 input으로 사용하도록 하면 binary classification 성능을 향상할 수 있음

-> 그래서 full-band oversmoothing 문제를 해결하기 위해 multi-resolution spectrogram discriminator를 사용하는 UnivNet을 제안

UnivNet
- 다양한 parameter set을 사용하여 계산된 multiple linear spectrogram magnitude를 사용하는 Multi-Resolution Spectrogram Discriminator (MRSD)를 도입
- Full-band mel-spectrogram을 input으로 사용하여 MRSD를 통해 high-resolution signal을 생성
- 추가적으로 waveform의 여러 scale에 대한 Multi-Period Waveform Discriminator (MPWD)와 결합하여 spectral, temporal domain 모두를 모델링하도록 함

< Overall of UnivNet >

Full-band over-smoothing 문제를 해결하는 고품질, real-time neural vocoder
Multiple linear spectrogram magnitude를 사용하는 MRSD를 도입
결과적으로 기존 GAN-based vocoder 보다 우수한 합성 품질과 추론 속도를 달성

2. Method

- Generator

UnivNet generator $G G <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>G</mi></math>$ 는 MelGAN의 아이디어를 활용함
- Noise sequence $z z <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>z</mi></math>$ 는 input으로 사용되고 log mel-spectrogram $c c <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>c</mi></math>$ 는 condition으로 사용됨
  - $z z <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>z</mi></math>$ 의 length는 $c c <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>c</mi></math>$ 와 동일하고, output $ˆ x^x <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mover><mi>x</mi><mo stretchy="false">^</mo></mover></mrow></math>$ 의 length는 transposed convolution을 통해 target waveform $x x <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>x</mi></math>$ 와 동일해짐
- Condition의 local information을 효율적으로 capture하기 위해, Location-Variable Convolution (LVC)를 추가함
  - LVC layer의 kernel은 log mel-spectrogram을 input으로 사용하는 kernel predictor를 통해 예측됨
  - Kernel predictor는 residual stack에 연결되고, 하나의 kernel predictor는 하나의 residual stack에 있는 모든 LVC layer의 kernel을 동시에 예측함
- Multi-speaker에서의 generality를 향상하기 위해, Gated Activation Unit (GAU)가 각 residual connection에 추가됨

- Discriminator

Discriminator $D D <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>D</mi></math>$ 는 real/generated signal에서 계산된 multiple spectrogram과 reshaped waveform을 활용함
- 이를 위한 Multi-Resolution Spectrogram Discriminator (MRSD)는
  1. 각 $m m <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>m</mi></math>$ -th sub-discriminator에 대한 input을 위해, $M M <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>M</mi></math>$ 개의 real/generated linear spectrogram magnitude ${sm=|FTm(x)|,ˆsm=|FTm(ˆx)|}Mm=1{sm=|FTm(x)|,^sm=|FTm(^x)|}Mm=1<math xmlns="http://www.w3.org/1998/Math/MathML"><mo fence="false" stretchy="false">{</mo><msub><mi>s</mi><mrow data-mjx-texclass="ORD"><mi>m</mi></mrow></msub><mo>=</mo><mrow data-mjx-texclass="ORD"><mo stretchy="false">|</mo></mrow><mi>F</mi><msub><mi>T</mi><mrow data-mjx-texclass="ORD"><mi>m</mi></mrow></msub><mo stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo><mrow data-mjx-texclass="ORD"><mo stretchy="false">|</mo></mrow><mo>,</mo><msub><mrow data-mjx-texclass="ORD"><mover><mi>s</mi><mo stretchy="false">^</mo></mover></mrow><mrow data-mjx-texclass="ORD"><mi>m</mi></mrow></msub><mo>=</mo><mrow data-mjx-texclass="ORD"><mo stretchy="false">|</mo></mrow><mi>F</mi><msub><mi>T</mi><mrow data-mjx-texclass="ORD"><mi>m</mi></mrow></msub><mo stretchy="false">(</mo><mrow data-mjx-texclass="ORD"><mover><mi>x</mi><mo stretchy="false">^</mo></mover></mrow><mo stretchy="false">)</mo><mrow data-mjx-texclass="ORD"><mo stretchy="false">|</mo></mrow><msubsup><mo fence="false" stretchy="false">}</mo><mrow data-mjx-texclass="ORD"><mi>m</mi><mo>=</mo><mn>1</mn></mrow><mrow data-mjx-texclass="ORD"><mi>M</mi></mrow></msubsup></math>$ 는,
    - $M M <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>M</mi></math>$ STFT parameter set ${F T m (\cdot)} M m = 1 {F T_{m} (\cdot)}_{m = 1}^{M} <math xmlns="http://www.w3.org/1998/Math/MathML"><mo fence="false" stretchy="false">{</mo><mi>F</mi><msub><mi>T</mi><mrow data-mjx-texclass="ORD"><mi>m</mi></mrow></msub><mo stretchy="false">(</mo><mo>\cdot</mo><mo stretchy="false">)</mo><msubsup><mo fence="false" stretchy="false">}</mo><mrow data-mjx-texclass="ORD"><mi>m</mi><mo>=</mo><mn>1</mn></mrow><mrow data-mjx-texclass="ORD"><mi>M</mi></mrow></msubsup></math>$ 을 사용하여 동일한 waveform에서 계산됨
    - 각각은 Fourier transform의 point 수, frame shift interval, window length를 포함
  2. MRSD는 다양한 temporal, spectral resolution을 가지는 multiple spectrogram을 사용하므로, full-band에 걸쳐 high-resolution signal을 생성할 수 있음
- 구조적으로는 MelGAN의 Multi-Scale Wavefrom Discriminator (MSWD)를 기반으로 strided 2D convolution과 Leaky ReLU로 구성됨
- Temporal doamin에서 detailed adversarial 모델링을 위해 HiFi-GAN의 Multi-Period Waveform Discriminator (MPWD)를 추가함
  - 이때 waveform의 periodic component는 prime number set의 interval로 추출되어 각 sub-discriminator에 대한 input으로 사용됨

- Training Loss

Multi-resolution STFT loss는 학습을 위한 auxiliary loss로써, 다양한 STFT parameter set을 사용하여 계산된 multiple spectrogram loss의 합에 해당함
- Spectral convergence loss $L s c <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mrow data-mjx-texclass="ORD"><mi data-mjx-variant="-tex-calligraphic" mathvariant="script">L</mi></mrow><mrow data-mjx-texclass="ORD"><mi>s</mi><mi>c</mi></mrow></msub></math>$ 와 log STFT magnitude loss $L m a g <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mrow data-mjx-texclass="ORD"><mi data-mjx-variant="-tex-calligraphic" mathvariant="script">L</mi></mrow><mrow data-mjx-texclass="ORD"><mi>m</mi><mi>a</mi><mi>g</mi></mrow></msub></math>$ 로 구성된 loss $L a u x <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mrow data-mjx-texclass="ORD"><mi data-mjx-variant="-tex-calligraphic" mathvariant="script">L</mi></mrow><mrow data-mjx-texclass="ORD"><mi>a</mi><mi>u</mi><mi>x</mi></mrow></msub></math>$ 는:
  (Eq. 1) $Lsc(s,ˆs)=||s−ˆs||F||s||F,Lmag(s,ˆs)=1S||logs−logˆs||1<math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mrow data-mjx-texclass="ORD"><mi data-mjx-variant="-tex-calligraphic" mathvariant="script">L</mi></mrow><mrow data-mjx-texclass="ORD"><mi>s</mi><mi>c</mi></mrow></msub><mo stretchy="false">(</mo><mi>s</mi><mo>,</mo><mrow data-mjx-texclass="ORD"><mover><mi>s</mi><mo stretchy="false">^</mo></mover></mrow><mo stretchy="false">)</mo><mo>=</mo><mfrac><mrow><mo stretchy="false">|</mo><mrow data-mjx-texclass="ORD"><mo stretchy="false">|</mo></mrow><mi>s</mi><mo>−</mo><mrow data-mjx-texclass="ORD"><mover><mi>s</mi><mo stretchy="false">^</mo></mover></mrow><mrow data-mjx-texclass="ORD"><mo stretchy="false">|</mo></mrow><msub><mo stretchy="false">|</mo><mrow data-mjx-texclass="ORD"><mi>F</mi></mrow></msub></mrow><mrow><mo stretchy="false">|</mo><mrow data-mjx-texclass="ORD"><mo stretchy="false">|</mo></mrow><mi>s</mi><mrow data-mjx-texclass="ORD"><mo stretchy="false">|</mo></mrow><msub><mo stretchy="false">|</mo><mrow data-mjx-texclass="ORD"><mi>F</mi></mrow></msub></mrow></mfrac><mo>,</mo><mstyle scriptlevel="0"><mspace width="0.167em"></mspace></mstyle><mstyle scriptlevel="0"><mspace width="0.167em"></mspace></mstyle><msub><mrow data-mjx-texclass="ORD"><mi data-mjx-variant="-tex-calligraphic" mathvariant="script">L</mi></mrow><mrow data-mjx-texclass="ORD"><mi>m</mi><mi>a</mi><mi>g</mi></mrow></msub><mo stretchy="false">(</mo><mi>s</mi><mo>,</mo><mrow data-mjx-texclass="ORD"><mover><mi>s</mi><mo stretchy="false">^</mo></mover></mrow><mo stretchy="false">)</mo><mo>=</mo><mfrac><mn>1</mn><mi>S</mi></mfrac><mrow data-mjx-texclass="ORD"><mo stretchy="false">|</mo></mrow><mrow data-mjx-texclass="ORD"><mo stretchy="false">|</mo></mrow><mi>log</mi><mo data-mjx-texclass="NONE">⁡</mo><mi>s</mi><mo>−</mo><mi>log</mi><mo data-mjx-texclass="NONE">⁡</mo><mrow data-mjx-texclass="ORD"><mover><mi>s</mi><mo stretchy="false">^</mo></mover></mrow><mrow data-mjx-texclass="ORD"><mo stretchy="false">|</mo></mrow><msub><mo stretchy="false">|</mo><mrow data-mjx-texclass="ORD"><mn>1</mn></mrow></msub></math>$
  (Eq. 2) $Laux(x,ˆx)=1M∑Mm=1Ex,ˆx[Lsc(sm,ˆsm)+Lmag(sm,ˆsm)]<math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mrow data-mjx-texclass="ORD"><mi data-mjx-variant="-tex-calligraphic" mathvariant="script">L</mi></mrow><mrow data-mjx-texclass="ORD"><mi>a</mi><mi>u</mi><mi>x</mi></mrow></msub><mo stretchy="false">(</mo><mi>x</mi><mo>,</mo><mrow data-mjx-texclass="ORD"><mover><mi>x</mi><mo stretchy="false">^</mo></mover></mrow><mo stretchy="false">)</mo><mo>=</mo><mfrac><mn>1</mn><mi>M</mi></mfrac><munderover><mo data-mjx-texclass="OP">∑</mo><mrow data-mjx-texclass="ORD"><mi>m</mi><mo>=</mo><mn>1</mn></mrow><mrow data-mjx-texclass="ORD"><mi>M</mi></mrow></munderover><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="double-struck">E</mi></mrow><mrow data-mjx-texclass="ORD"><mi>x</mi><mo>,</mo><mrow data-mjx-texclass="ORD"><mover><mi>x</mi><mo stretchy="false">^</mo></mover></mrow></mrow></msub><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">[</mo><msub><mrow data-mjx-texclass="ORD"><mi data-mjx-variant="-tex-calligraphic" mathvariant="script">L</mi></mrow><mrow data-mjx-texclass="ORD"><mi>s</mi><mi>c</mi></mrow></msub><mo stretchy="false">(</mo><msub><mi>s</mi><mrow data-mjx-texclass="ORD"><mi>m</mi></mrow></msub><mo>,</mo><msub><mrow data-mjx-texclass="ORD"><mover><mi>s</mi><mo stretchy="false">^</mo></mover></mrow><mrow data-mjx-texclass="ORD"><mi>m</mi></mrow></msub><mo stretchy="false">)</mo><mo>+</mo><msub><mrow data-mjx-texclass="ORD"><mi data-mjx-variant="-tex-calligraphic" mathvariant="script">L</mi></mrow><mrow data-mjx-texclass="ORD"><mi>m</mi><mi>a</mi><mi>g</mi></mrow></msub><mo stretchy="false">(</mo><msub><mi>s</mi><mrow data-mjx-texclass="ORD"><mi>m</mi></mrow></msub><mo>,</mo><msub><mrow data-mjx-texclass="ORD"><mover><mi>s</mi><mo stretchy="false">^</mo></mover></mrow><mrow data-mjx-texclass="ORD"><mi>m</mi></mrow></msub><mo stretchy="false">)</mo><mo data-mjx-texclass="CLOSE">]</mo></mrow></math>$
  - $| | \cdot | | F, | | \cdot | | 1 <math xmlns="http://www.w3.org/1998/Math/MathML"><mo stretchy="false">|</mo><mrow data-mjx-texclass="ORD"><mo stretchy="false">|</mo></mrow><mo>\cdot</mo><mrow data-mjx-texclass="ORD"><mo stretchy="false">|</mo></mrow><msub><mrow data-mjx-texclass="ORD"><mo stretchy="false">|</mo></mrow><mrow data-mjx-texclass="ORD"><mi>F</mi></mrow></msub><mo>,</mo><mrow data-mjx-texclass="ORD"><mo stretchy="false">|</mo></mrow><mrow data-mjx-texclass="ORD"><mo stretchy="false">|</mo></mrow><mo>\cdot</mo><mrow data-mjx-texclass="ORD"><mo stretchy="false">|</mo></mrow><msub><mo stretchy="false">|</mo><mrow data-mjx-texclass="ORD"><mn>1</mn></mrow></msub></math>$ : 각각 Frobenius, $L 1 <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>L</mi><mn>1</mn></math>$ norm, $S <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>S</mi></math>$ : spectrogram의 element 수
  - $m <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>m</mi></math>$ -th $L s c, L m a g <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mrow data-mjx-texclass="ORD"><mi data-mjx-variant="-tex-calligraphic" mathvariant="script">L</mi></mrow><mrow data-mjx-texclass="ORD"><mi>s</mi><mi>c</mi></mrow></msub><mo>,</mo><msub><mrow data-mjx-texclass="ORD"><mi data-mjx-variant="-tex-calligraphic" mathvariant="script">L</mi></mrow><mrow data-mjx-texclass="ORD"><mi>m</mi><mi>a</mi><mi>g</mi></mrow></msub></math>$ 는 $m <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>m</mi></math>$ -th MRSD sub-discriminator에 사용된 $s m <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>s</mi><mrow data-mjx-texclass="ORD"><mi>m</mi></mrow></msub></math>$ 과 $ˆ s m <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mrow data-mjx-texclass="ORD"><mover><mi>s</mi><mo stretchy="false">^</mo></mover></mrow><mrow data-mjx-texclass="ORD"><mi>m</mi></mrow></msub></math>$ 을 reuse 함
  - 각 loss의 개수는 $M <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>M</mi></math>$ 으로, MRSD sub-discriminator의 개수와 동일
- UnivNet은 least squares GAN의 objective를 사용하고, 이때 overall objective는:
  (Eq. 3) $LG=λLaux(x,G(z,c))+1K∑Kk=1Ez,c[(Dk(G(z,c))−1)2]<math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mrow data-mjx-texclass="ORD"><mi data-mjx-variant="-tex-calligraphic" mathvariant="script">L</mi></mrow><mrow data-mjx-texclass="ORD"><mi>G</mi></mrow></msub><mo>=</mo><mi>λ</mi><msub><mrow data-mjx-texclass="ORD"><mi data-mjx-variant="-tex-calligraphic" mathvariant="script">L</mi></mrow><mrow data-mjx-texclass="ORD"><mi>a</mi><mi>u</mi><mi>x</mi></mrow></msub><mo stretchy="false">(</mo><mi>x</mi><mo>,</mo><mi>G</mi><mo stretchy="false">(</mo><mi>z</mi><mo>,</mo><mi>c</mi><mo stretchy="false">)</mo><mo stretchy="false">)</mo><mo>+</mo><mfrac><mn>1</mn><mi>K</mi></mfrac><munderover><mo data-mjx-texclass="OP">∑</mo><mrow data-mjx-texclass="ORD"><mi>k</mi><mo>=</mo><mn>1</mn></mrow><mrow data-mjx-texclass="ORD"><mi>K</mi></mrow></munderover><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="double-struck">E</mi></mrow><mrow data-mjx-texclass="ORD"><mi>z</mi><mo>,</mo><mi>c</mi></mrow></msub><mo stretchy="false">[</mo><mo stretchy="false">(</mo><msub><mi>D</mi><mrow data-mjx-texclass="ORD"><mi>k</mi></mrow></msub><mo stretchy="false">(</mo><mi>G</mi><mo stretchy="false">(</mo><mi>z</mi><mo>,</mo><mi>c</mi><mo stretchy="false">)</mo><mo stretchy="false">)</mo><mo>−</mo><mn>1</mn><msup><mo stretchy="false">)</mo><mrow data-mjx-texclass="ORD"><mn>2</mn></mrow></msup><mo stretchy="false">]</mo></math>$
  (Eq. 4) $LD=1K∑Kk=1(Ex[(Dk(x)−1)2]+Ez,c[Dk(G(z,c))2])<math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mrow data-mjx-texclass="ORD"><mi data-mjx-variant="-tex-calligraphic" mathvariant="script">L</mi></mrow><mrow data-mjx-texclass="ORD"><mi>D</mi></mrow></msub><mo>=</mo><mfrac><mn>1</mn><mi>K</mi></mfrac><munderover><mo data-mjx-texclass="OP">∑</mo><mrow data-mjx-texclass="ORD"><mi>k</mi><mo>=</mo><mn>1</mn></mrow><mrow data-mjx-texclass="ORD"><mi>K</mi></mrow></munderover><mo stretchy="false">(</mo><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="double-struck">E</mi></mrow><mrow data-mjx-texclass="ORD"><mi>x</mi></mrow></msub><mo stretchy="false">[</mo><mo stretchy="false">(</mo><msub><mi>D</mi><mrow data-mjx-texclass="ORD"><mi>k</mi></mrow></msub><mo stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo><mo>−</mo><mn>1</mn><msup><mo stretchy="false">)</mo><mrow data-mjx-texclass="ORD"><mn>2</mn></mrow></msup><mo stretchy="false">]</mo><mo>+</mo><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="double-struck">E</mi></mrow><mrow data-mjx-texclass="ORD"><mi>z</mi><mo>,</mo><mi>c</mi></mrow></msub><mo stretchy="false">[</mo><msub><mi>D</mi><mrow data-mjx-texclass="ORD"><mi>k</mi></mrow></msub><mo stretchy="false">(</mo><mi>G</mi><mo stretchy="false">(</mo><mi>z</mi><mo>,</mo><mi>c</mi><mo stretchy="false">)</mo><msup><mo stretchy="false">)</mo><mrow data-mjx-texclass="ORD"><mn>2</mn></mrow></msup><mo stretchy="false">]</mo><mo stretchy="false">)</mo></math>$
  - $D k <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>D</mi><mrow data-mjx-texclass="ORD"><mi>k</mi></mrow></msub></math>$ : MRSD, MPWD의 $k <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>k</mi></math>$ -th sub-discriminator, $K <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>K</mi></math>$ : 전체 sub-discriminator 수, $λ <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>λ</mi></math>$ : balance parameter

3. Experiments

- Settings

Dataset : LibriTTS
Comparisons : MelGAN, HiFi-GAN, Parallel WaveGAN

- Results

Ablation Study
- 아래 표에서 G1= LVC, G2=GAU, D1=MRSD, D2=MPWD, D3=MSWD
- 각 component들은 모두 UnivNet의 성능 향상에 크게 기여함

특히 MRSD를 제거하는 경우, high-frequency band에서 over-smoothing 문제가 발생함
- 결과적으로 MOS 저하로 이어짐

Comparison with Existing Models
- 전체적인 합성 품질 측면에서도 UnivNet이 가장 우수한 것으로 나타남
- 합성 효율성 측면에서 UnivNet은 더 적은 수의 parameter로 real-time보다 200배 빠른 합성 속도를 보임

'Paper > Vocoder' 카테고리의 다른 글

[Paper 리뷰] BigVGAN: A Universal Neural Vocoder with Large-Scale Training (0)	2024.03.30
[Paper 리뷰] AutoVocoder: Fast Waveform Generation from a Learned Speech Representation Using Differentiable Digital Signal Processing (0)	2024.03.27
[Paper 리뷰] FastFit: Towards Real-Time Iterative Neural Vocoder by Replacing U-Net Encoder with Multiple STFTs (0)	2024.03.21
[Paper 리뷰] SpecDiff-GAN: A Spectrally-Shaped Noise Diffusion GAN for Speech and Music Synthesis (0)	2024.03.15
[Paper 리뷰] LightVoc: An Upsampling-Free GAN Vocoder based on Conformer and Inverse Short-Time Fourier Transform (0)	2024.03.13

Let IT Begin Voice Engineer | 심심하면 앨범 리뷰 올립니다

최근에 올라온 글

최근에 달린 댓글

« 2025/04 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Total

Today

Yesterday

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

Let IT Begin

티스토리 뷰

[Paper 리뷰] UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation

UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation

1. Introduction

2. Method

- Generator

- Discriminator

- Training Loss

3. Experiments

- Settings

- Results

'Paper > Vocoder' 카테고리의 다른 글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역