BLTRCNN-Based 3-D Articulatory Movement Prediction: Learning Articulatory Synchronicity From Both Text and Audio Inputs

Top Reasons to Join SPS Today!

1. IEEE Signal Processing Magazine
2. Signal Processing Digital Library*
3. Inside Signal Processing Newsletter
4. SPS Resource Center
5. Career advancement & recognition
6. Discounts on conferences and publications
7. Professional networking
8. Communities for students, young professionals, and women
9. Volunteer opportunities
10. Coming soon! PDH/CEU credits
Click here to learn more.

TMM Volume 21 Issue 7

BLTRCNN-Based 3-D Articulatory Movement Prediction: Learning Articulatory Synchronicity From Both Text and Audio Inputs

TMM Featured Articles

By:

Lingyun Yu; Jun Yu; Qiang Ling

Predicting articulatory movements from audio or text has diverse applications, such as speech visualization. Various approaches have been proposed to solve the acoustic-articulatory mapping problem. However, their precision is not high enough with only acoustic features available. Recently, deep neural network (DNN) has brought tremendous success in various fields, like speech recognition and image processing. To increase the accuracy, we propose a new network architecture for articulatory movement prediction with both text and audio inputs, called a bottleneck long-term recurrent convolutional neural network (BLTRCNN). To the best of our knowledge, it is the first time to predict articulatory movements based on DNN by fusing text and audio inputs. Our BLTRCNN consists of two networks. The first is the bottleneck network, generating a compact bottleneck features of text information for each frame independently. The second, including convolutional neural network, long short-term memory and skip connection, is called the long-term recurrent convolutional neural network (LTRCNN). LTRCNN is used for articulatory movement prediction when bottleneck features, acoustic features, and text features are integrated as inputs together. Experiments show that the proposed BLTRCNN achieves the state-of-the-art root-mean-square error (RMSE) 0.528 mm and the correlation coefficient 0.961. Moreover, we also demonstrate how text information complements acoustic features in this prediction task.

Read on IEEE Xplore

Tags:

IEEE TMM Article

SPS Social Media

IEEE SPS Facebook Page https://www.facebook.com/ieeeSPS
IEEE SPS X Page https://x.com/IEEEsps
IEEE SPS Instagram Page https://www.instagram.com/ieeesps/?hl=en
IEEE SPS LinkedIn Page https://www.linkedin.com/company/ieeesps/
IEEE SPS YouTube Channel https://www.youtube.com/ieeeSPS

IEEE SPS Educational Resources

IEEE SPS Resource Center

IEEE SPS YouTube Channel

© Copyright 2025 IEEE - All rights reserved. Use of this website signifies your agreement to the IEEE Terms and Conditions.
A public charity, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity.

IVMSP_2020.jpg

(IVMSP 2026) IEEE 15th Image, Video, and Multidimensional Signal Processing Workshop

Webinar.jpg

SPS BSI Webinar: Integration of Brain Imaging and Genomics with Interpretable Multimodal Collaborative Learning

webinar_general_dsi.jpg

SA-TWG Webinar: Channel Estimation for Beyond Diagonal RIS via Tensor Decomposition

What is Signal Processing?

Popular Pages

Today's:

All time:

Last viewed:

BLTRCNN-Based 3-D Articulatory Movement Prediction: Learning Articulatory Synchronicity From Both Text and Audio Inputs

Transactions on Multimedia

Publications & Resources

For Authors

SP-Magazine-Front_Cover-March-2025.jpg

CAI_2027_Call_for_Proposals.png

nominate_2_general.jpg

Top Reasons to Join SPS Today!

BLTRCNN-Based 3-D Articulatory Movement Prediction: Learning Articulatory Synchronicity From Both Text and Audio Inputs

SPS Social Media

IEEE SPS Educational Resources

What is Signal Processing?

Popular Pages

Today's:

All time:

Last viewed:

BLTRCNN-Based 3-D Articulatory Movement Prediction: Learning Articulatory Synchronicity From Both Text and Audio Inputs

Search form

You are here

Transactions on Multimedia

Publications & Resources

For Authors

Top Reasons to Join SPS Today!

BLTRCNN-Based 3-D Articulatory Movement Prediction: Learning Articulatory Synchronicity From Both Text and Audio Inputs

SPS Social Media

IEEE SPS Educational Resources