A Vector Quantized Variational Autoencoder (VQ-VAE) Autoregressive Neural F0 Model for Statistical Parametric Speech Synthesis

Top Reasons to Join SPS Today!

1. IEEE Signal Processing Magazine
2. Signal Processing Digital Library*
3. Inside Signal Processing Newsletter
4. SPS Resource Center
5. Career advancement & recognition
6. Discounts on conferences and publications
7. Professional networking
8. Communities for students, young professionals, and women
9. Volunteer opportunities
10. Coming soon! PDH/CEU credits
Click here to learn more.

TASLP Volume 28 | 2020

A Vector Quantized Variational Autoencoder (VQ-VAE) Autoregressive Neural F0 Model for Statistical Parametric Speech Synthesis

TASLPRO Articles

By:

Xin Wang; Shinji Takaki; Junichi Yamagishi; Simon King; Keiichi Tokuda

Recurrent neural networks (RNNs) can predict fundamental frequency (F 0 ) for statistical parametric speech synthesis systems, given linguistic features as input. However, these models assume conditional independence between consecutive F 0 values, given the RNN state. In a previous study, we proposed autoregressive (AR) neural F 0 models to capture the causal dependency of successive F 0 values. In subjective evaluations, a deep AR model (DAR) outperformed an RNN. Here, we propose a Vector Quantized Variational Autoencoder (VQ-VAE) neural F 0 model that is both more efficient and more interpretable than the DAR. This model has two stages: one uses the VQ-VAE framework to learn a latent code for the F 0 contour of each linguistic unit, and other learns to map from linguistic features to latent codes. In contrast to the DAR and RNN, which process the input linguistic features frame-by-frame, the new model converts one linguistic feature vector into one latent code for each linguistic unit. The new model achieves better objective scores than the DAR, has a smaller memory footprint and is computationally faster. Visualization of the latent codes for phones and moras reveals that each latent code represents an F 0 shape for a linguistic unit.

Read on IEEE Xplore

Tags:

IEEE TASLP Article

SPS Social Media

IEEE SPS Facebook Page https://www.facebook.com/ieeeSPS
IEEE SPS X Page https://x.com/IEEEsps
IEEE SPS Instagram Page https://www.instagram.com/ieeesps/?hl=en
IEEE SPS LinkedIn Page https://www.linkedin.com/company/ieeesps/
IEEE SPS YouTube Channel https://www.youtube.com/ieeeSPS

IEEE SPS Educational Resources

IEEE SPS Resource Center

IEEE SPS YouTube Channel

© Copyright 2025 IEEE - All rights reserved. Use of this website signifies your agreement to the IEEE Terms and Conditions.
A public charity, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity.

general_get_involved_tc_article_full.jpg

Get Involved with Technical Committees by becoming a TC Affiliate

Brain Workshop

IEEE Brain Discovery and Neurotechnology Workshop

Member Spotlight (2).png

Meet SPS Member Ahmet Mete Elbir

What is Signal Processing?

Popular Pages

Today's:

All time:

Last viewed:

A Vector Quantized Variational Autoencoder (VQ-VAE) Autoregressive Neural F0 Model for Statistical Parametric Speech Synthesis

Publications & Resources

For Authors

Top Reasons to Join SPS Today!

A Vector Quantized Variational Autoencoder (VQ-VAE) Autoregressive Neural F0 Model for Statistical Parametric Speech Synthesis

SPS Social Media

IEEE SPS Educational Resources

general_get_involved_tc_article_full.jpg

Get Involved with Technical Committees by becoming a TC Affiliate

Brain Workshop

IEEE Brain Discovery and Neurotechnology Workshop

Member Spotlight (2).png

Meet SPS Member Ahmet Mete Elbir

What is Signal Processing?

Popular Pages

Today's:

All time:

Last viewed:

A Vector Quantized Variational Autoencoder (VQ-VAE) Autoregressive Neural F0 Model for Statistical Parametric Speech Synthesis

Search form

You are here

Publications & Resources

For Authors

Top Reasons to Join SPS Today!

A Vector Quantized Variational Autoencoder (VQ-VAE) Autoregressive Neural F0 Model for Statistical Parametric Speech Synthesis

SPS Social Media

IEEE SPS Educational Resources