The technology we use, and even rely on, in our everyday lives –computers, radios, video, cell phones – is enabled by signal processing. Learn More »
1. IEEE Signal Processing Magazine
2. Signal Processing Digital Library*
3. Inside Signal Processing Newsletter
4. SPS Resource Center
5. Career advancement & recognition
6. Discounts on conferences and publications
7. Professional networking
8. Communities for students, young professionals, and women
9. Volunteer opportunities
10. Coming soon! PDH/CEU credits
Click here to learn more.
SLTC Newsletter, November 2015
The 16h Interspeech Conference was recently hosted in Dresden, Germany from September 6th-10th, 2015. In this article, we highlight upward and downward trends in the speech community.
Deep Neural Networks continue to remain a hot area in speech recognition conferences. There were over 8 sessions dedicated to neural networks, in acoustic modeling, language modeling, robustness, speaker recognition and speech synthesis. This totals over 75 papers on neural networks, an increase from last year as well as from ICASSP.
One particular area that has gotten great interest is learning directly from the raw waveform using neural networks. There was an entire oral session devoted to this. Many research labs showed that the neural network is able to learn a representation that is very similar to mel, which is quite remarkable. In addition, it seems the raw-waveform can match the performance of log-mel features, removing the potential need to have a complex frontend.
Another session that garnered a lot of attention was fast and scalable computing using neural networks. This included using specific modeling techniques, such as Connectionist Temporal Classification (CTC), Convolutional Neural Networks (CNNs) and compression techniques to speed up training. There was also an two interesting talks on using GPUs, one in particularly with parallel GPUs, to speed up training.
In addition, there was a session on novel architectures for deep neural networks. This included new architectures such as time delay neural networks (TDNNs) and Long Short-Term Memory Based Convolutional Recurrent Neural Networks. In addition, there were some interesting papers on learning the parameters in the activation function, for example the slope for rectified linear units. Finally, knowledge distillation, that is training one type of neural network to mimic the behavior of another, seems to be a popular trend.
Adaptation continues to be a hot area, with sessions devoted to speaker adaptation and noise adaptation. Adaptation techniques continue to look at using adapted features into the network, augmenting features such as i-vectors, and refactoring the network for adaptation. Finally, speech synthesis also seems to becoming an increasingly popular area, with 7 sessions devoted to this. Sessions were focused either statistical, stochastic or parameteric speech synthesis. In addition, there was an session on deep learning for speech synthesis.
Language modeling continues to be a downward trend at Interspeech. There were only 3 sessions focused on language modeling. There was however a session devoted to neural network language modeling. I expect this area to have an increased number of papers in coming years. Furthermore, given the success of neural networks for acoustic modeling, I expect to see more papers looking at end-to-end speech recognition, doing the acoustic and language modeling jointly within the neural network.
While the focus continues to be on neural networks, there is very few papers outside of this area, particularly for acoustic modeling. There was a session on novel approaches for speech recognition, but most of the papers there also seem to be neural network focused. We as a community should continue to explore alternative architectures to test their potential benefits and complementarity.
Tara N. Sainath is with Google. She is a staff writer for the SLTC newsletter.
Home | Sitemap | Contact | Accessibility | Nondiscrimination Policy | IEEE Ethics Reporting | IEEE Privacy Policy | Terms | Feedback
© Copyright 2024 IEEE - All rights reserved. Use of this website signifies your agreement to the IEEE Terms and Conditions.
A public charity, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity.