The technology we use, and even rely on, in our everyday lives –computers, radios, video, cell phones – is enabled by signal processing. Learn More »
1. IEEE Signal Processing Magazine
2. Signal Processing Digital Library*
3. Inside Signal Processing Newsletter
4. SPS Resource Center
5. Career advancement & recognition
6. Discounts on conferences and publications
7. Professional networking
8. Communities for students, young professionals, and women
9. Volunteer opportunities
10. Coming soon! PDH/CEU credits
Click here to learn more.
a) What sparked your interest in speech and language processing?
When I was a graduate student at the University of Tokyo, I was studying signal processing and conducting experiments using various signals. One of them was the speech signal, and I was gradually getting interested in speech itself rather than general signal processing techniques. When I finished studying in 1970, I decided to conduct research on speech as my professional job. I started working on speech and speaker recognition as basic research, since nobody believed we could make useful devices automatically recognizing speech.
b) How do you think speech and language processing is changing the society for the new generation?
Speech is the most convenient and useful medium used by human beings to communicate with each other, and human speech production and recognition is closely related to our brain activities. Even when we are reading text, we are unconsciously hearing sound in our brain. Since speech is more fundamental to communication than text, there are many languages spoken in the world that have no written form. Speech is even the most convenient and easiest method for communicating with AI-based systems, including cell-phones and robots. These are the reasons that so many researchers have attacked speech recognition research and created many important techniques that can be used not only for speech but also for many other pattern recognition areas, such as computer vision and NLP. Speech recognition has been one of the driving forces of pattern recognition research for many years. I believe advanced speech technology will enhance capability of communication with various AI systems, including robots, and make our life much easier and more fun in the near future.
c) What is your holy grail in speech and language processing? When will we achieve it?
My holy grail in speech and language processing is achieving flexible and reliable speech recognition and understanding technology which can utilize various levels of contextual information that we human beings are always heavily reliant on. I believe this is crucial to solve the problems of recognizing and understanding conversational speech, meeting speech, and “cocktail party” speech. It is difficult to solve such problems, but, since the progress of deep-learning-based technology is amazing, I hope we can achieve the goal within 10 years.
d) Do you have any specific advice for students, junior faculty or others early in their careers?
Since physical characteristics of speech are highly optimized for human speech production and hearing mechanisms, as well as human brain activities, I strongly encourage students and young researchers to study and utilize the mechanism how human beings are speaking, hearing and understanding speech, in combination with deep-learning techniques.
e) What development in the field has most surprised you? Was there a hard problem that turned out to be easy? An easy problem that proved surprisingly difficult?
In my over 45 years of speech and speaker recognition research, I had never believed we could create really useful systems that everybody could use until 10 years ago. Speech recognition is still not easy, but, fortunately, recent deep-learning based technology supported by computer technology and big data is so amazing that we have recently developed various products that many people enjoy using. Using speech is very natural and easy for human beings, but it is still difficult for computers to copy what human beings are doing, including very efficient learning capability.
f) Oftentimes, the work that people get noticed for is not the same as the work which they find most exciting/rewarding/interesting. Which of your publications is your favorite? Why?
My most favorite papers are:
When I was studying the mechanisms of human speech recognition, I learned that human beings are very good at both acoustical and linguistic prediction, and acoustical prediction is based on transitional features of spectrum. The first paper experimentally verified that spectral transition is crucially important for human speech perception, and the second and third papers proposed using delta (transitional) cepstral features in combination with instantaneous cepstral features in speaker and speech recognition. Delta features are still used after more than 30 years, probably because such transitional features are very fundamental for characterizing speech sounds.
Home | Sitemap | Contact | Accessibility | Nondiscrimination Policy | IEEE Ethics Reporting | IEEE Privacy Policy | Terms | Feedback
© Copyright 2024 IEEE - All rights reserved. Use of this website signifies your agreement to the IEEE Terms and Conditions.
A public charity, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity.