1. IEEE Signal Processing Magazine
2. Signal Processing Digital Library*
3. Inside Signal Processing Newsletter
4. SPS Resource Center
5. Career advancement & recognition
6. Discounts on conferences and publications
7. Professional networking
8. Communities for students, young professionals, and women
9. Volunteer opportunities
10. Coming soon! PDH/CEU credits
Click here to learn more.
Predicting articulatory movements from audio or text has diverse applications, such as speech visualization. Various approaches have been proposed to solve the acoustic-articulatory mapping problem. However, their precision is not high enough with only acoustic features available. Recently, deep neural network (DNN) has brought tremendous success in various fields, like speech recognition and image processing. To increase the accuracy, we propose a new network architecture for articulatory movement prediction with both text and audio inputs, called a bottleneck long-term recurrent convolutional neural network (BLTRCNN). To the best of our knowledge, it is the first time to predict articulatory movements based on DNN by fusing text and audio inputs. Our BLTRCNN consists of two networks. The first is the bottleneck network, generating a compact bottleneck features of text information for each frame independently. The second, including convolutional neural network, long short-term memory and skip connection, is called the long-term recurrent convolutional neural network (LTRCNN). LTRCNN is used for articulatory movement prediction when bottleneck features, acoustic features, and text features are integrated as inputs together. Experiments show that the proposed BLTRCNN achieves the state-of-the-art root-mean-square error (RMSE) 0.528 mm and the correlation coefficient 0.961. Moreover, we also demonstrate how text information complements acoustic features in this prediction task.
© Copyright 2023 IEEE – All rights reserved. Use of this website signifies your agreement to the IEEE Terms and Conditions.
A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity.