Getting to Know Your Fellow Researchers - Sadaoki Furui

December, 2018

Sadaoki Furui profile image

a) What sparked your interest in speech and language processing?

When I was a graduate student at the University of Tokyo, I was studying signal processing and conducting experiments using various signals. One of them was the speech signal, and I was gradually getting interested in speech itself rather than general signal processing techniques. When I finished studying in 1970, I decided to conduct research on speech as my professional job. I started working on speech and speaker recognition as basic research, since nobody believed we could make useful devices automatically recognizing speech.

b) How do you think speech and language processing is changing the society for the new generation?

Speech is the most convenient and useful medium used by human beings to communicate with each other, and human speech production and recognition is closely related to our brain activities. Even when we are reading text, we are unconsciously hearing sound in our brain. Since speech is more fundamental to communication than text, there are many languages spoken in the world that have no written form. Speech is even the most convenient and easiest method for communicating with AI-based systems, including cell-phones and robots. These are the reasons that so many researchers have attacked speech recognition research and created many important techniques that can be used not only for speech but also for many other pattern recognition areas, such as computer vision and NLP. Speech recognition has been one of the driving forces of pattern recognition research for many years. I believe advanced speech technology will enhance capability of communication with various AI systems, including robots, and make our life much easier and more fun in the near future.

c) What is your holy grail in speech and language processing? When will we achieve it?
My holy grail in speech and language processing is achieving flexible and reliable speech recognition and understanding technology which can utilize various levels of contextual information that we human beings are always heavily reliant on. I believe this is crucial to solve the problems of recognizing and understanding conversational speech, meeting speech, and “cocktail party” speech. It is difficult to solve such problems, but, since the progress of deep-learning-based technology is amazing, I hope we can achieve the goal within 10 years.

d) Do you have any specific advice for students, junior faculty or others early in their careers?

Since physical characteristics of speech are highly optimized for human speech production and hearing mechanisms, as well as human brain activities, I strongly encourage students and young researchers to study and utilize the mechanism how human beings are speaking, hearing and understanding speech, in combination with deep-learning techniques.

e) What development in the field has most surprised you? Was there a hard problem that turned out to be easy? An easy problem that proved surprisingly difficult?

In my over 45 years of speech and speaker recognition research, I had never believed we could create really useful systems that everybody could use until 10 years ago. Speech recognition is still not easy, but, fortunately, recent deep-learning based technology supported by computer technology and big data is so amazing that we have recently developed various products that many people enjoy using. Using speech is very natural and easy for human beings, but it is still difficult for computers to copy what human beings are doing, including very efficient learning capability.

f) Oftentimes, the work that people get noticed for is not the same as the work which they find most exciting/rewarding/interesting. Which of your publications is your favorite? Why?

My most favorite papers are:

Sadaoki, Furui, “On the role of Spectral Transition for Speech Perception”, J. Acoust. Soc. Am Vol. 80, No. 4, pp. 1016-1025, 1986
Sadaoki, Furui, “Cepstral Analysis Technique for Automatic Speaker Verification”, IEEE Transactions on Acoustic, Speech and Signal Processing, Vol. ASSP-29, No. 2, pp. 254-272, 1981
Sadaoki, Furui, “Speaker Independent Isolated Word Recognition Using Dynamic Features of Speech Spectrum”, IEEE Transactions on Acoustic, Speech and Signal Processing, Vol. ASSP-34, No. 1, pp. 52-59, 1986

When I was studying the mechanisms of human speech recognition, I learned that human beings are very good at both acoustical and linguistic prediction, and acoustical prediction is based on transitional features of spectrum. The first paper experimentally verified that spectral transition is crucially important for human speech perception, and the second and third papers proposed using delta (transitional) cepstral features in combination with instantaneous cepstral features in speaker and speech recognition. Delta features are still used after more than 30 years, probably because such transitional features are very fundamental for characterizing speech sounds.

Publications & Resources

Conferences & Events

Education & Training

Community & Involvement

Career & Industry

About IEEE SPS

For Volunteers

Getting to Know Your Fellow Researchers - Sadaoki Furui

IEEE Signal Processing Society on

Publications & Resources

Conferences & Events

Education & Training

Community & Involvement

About IEEE SPS

For Volunteers

Career & Industry

Education & Training