Interview with Prof. B. Yegnanarayana, INSA Senior Scientist, IIIT Hyderabad, India

You are here

Inside Signal Processing Newsletter Home Page

Top Reasons to Join SPS Today!

1. IEEE Signal Processing Magazine
2. Signal Processing Digital Library*
3. Inside Signal Processing Newsletter
4. SPS Resource Center
5. Career advancement & recognition
6. Discounts on conferences and publications
7. Professional networking
8. Communities for students, young professionals, and women
9. Volunteer opportunities
10. Coming soon! PDH/CEU credits
Click here to learn more.

News and Resources for Members of the IEEE Signal Processing Society

Interview with Prof. B. Yegnanarayana, INSA Senior Scientist, IIIT Hyderabad, India

Anubha Gupta, IIIT Delhi

Bayya YeganarayanaBayya Yegnanarayana is currently INSA Senior Scientist at the International Institute of Information Technology Hyderabad (IIIT-H).  He was an Institute Professor & Microsoft Chair from 2006 to 2016 at IIIT-H. He was a Professor (1980 to 2006) and Head of the CSE Dept (1985 to 1989) at IIT Madras, a visiting Associate Professor at Carnegie-Mellon University (CMU), Pittsburgh, USA (1977 to 1980), and a member of the faculty at the Indian Institute of Science (IISc), Bangalore, (1966 to 1978). He received BSc from Andhra University Visakhapatnam in 1961, and BE, ME and PhD from IISc Bangalore in 1964, 1966, and 1974, respectively. His research interests are in signal processing, speech, image processing and neural networks. He has published over 400 papers in these areas. He is the author of the book "Artificial Neural Networks", published by Prentice-Hall of India in 1999. He has supervised 36 PhD and 42 MS theses at IISc, IITM and IIIT-H. He is a Fellow of the Indian National Academy of Engineering (INAE), a Fellow of the Indian National Science Academy (INSA), a Fellow of the Indian Academy of Sciences (IASc), a Fellow of the IEEE (USA) and a Fellow of the International Speech Communications Association (ISCA). He was the recipient of the 3rd IETE Prof.S.V.C.Aiya Memorial Award in 1996.

He received the Prof.S.N.Mitra Memorial Award for the year 2006 from INAE. He was awarded the 2013 Distinguished Alumnus Award from IISc Bangalore. He was awarded "The Sayed Husain Zaheer Medal (2014)" of INSA in 2014. He received Prof. Rais Ahmed Memorial Lecture Award from the Acoustical Society of India in 2016. He was an Associate Editor for the IEEE Transactions on Audio, Speech and Language Processing during 2003-2006. He is currently an Associate Editor for the Journal of the Acoustical Society of America. He received Doctor of Science (Honoris Causa) from Jawaharlal Nehru Technological University Anantapur in February 2019. He was the General Chair for Interspeech2018 held in Hyderabad, India, during September 2018.  He was a visiting Professor at IIT Dharwad and at CMU Africa in Rwanda during 2019.  He is currently Adjunct Faculty at IIT Tirupati, Distinguished Professor at IIT Hyderabad, Distinguished Adjunct Professor at IIIT Naya Raipur.


Q. Please share your impactful work(s) with us.

The most important lesson I learned over years is that the more you probe into a natural phenomenon, the less we seem to know about it. My area of spoken language communication is one such area. I am fascinated to study the behaviour of sound in rooms and of speech production mechanism. The first decade of my over five decades in research involved in studying the behaviour of sound in practical environment like classrooms and auditoria, and in controlled environments like anechoic and reverberation rooms built by me at IISc Bangalore in 1960s. This led to the understanding of the nonexponential decay of sound in rooms through experimental and theoretical studies. When the role of speech is understood as an important means of human communication, my focus shifted to the study of intelligibility of speech in rooms, and to speech signal processing to improve the intelligibility. The dynamic characteristics of speech production system led to many signal processing challenges to extract the time varying excitation and vocal tract system features. In particular, my research focussed on the so-called leftovers in speech processing methods, namely, the phase component in the short-time Fourier transform analysis, the residual component in the linear prediction analysis, and the suprasegmental component related to prosody in speech.

Several new signal processing techniques were developed by our groups at IISc Bangalore, IIT Madras, and IIIT Hyderabad over the past four decades, mainly to address the varying time and frequency resolution issues due to changes in the excitation, vocal tract shape and the coupling of cavities during speech production process. In particular the following are some of the methods proposed:  Phase processing using group-delay functions, zero frequency filtering for extracting the excitation information, zero-time windowing for extracting the vocal tract information, and more recently single frequency filtering for analysis of speech signals with varying temporal and spectral resolutions.

In situations where explicit features could not be identified, nonlinear artificial neural network models were developed to capture the relevant information for a given speech processing application. Specifically, autoassociative neural network (AANN) models were proposed for speaker recognition studies, where distribution capturing ability of the models were exploited. Discriminative feedforward neural network models were developed for speech processing tasks, such as speaker separation and speech/nonspeech detection.

Most signal processing methods developed over years were applied for various tasks, such as emotion recognition, prosody modification, laughter synthesis, analysis of expressive voices, speech enhancement, speaker separation, and time delay estimation. The main focus of research in our group has been to develop signal processing methods for extracting features from speech signals produced in practical environments. The motivation is that good front-end signal processing reduces the complexity and computation in building speech systems. besides developing better understanding of the speech production process.

Q. In your opinion, what are some of the most exciting areas of research in speech processing and NLP for students and upcoming researchers?

Spoken language processing is an exciting area, with many applications, such as speech-to-speech translation, information access through voice mode, and person authentication, especially in a multilingual society like India. One can exploit the evolving technology for societal benefits. Currently, technology is dictating the methodology and applications, most of which are data driven. It appears that we are looking for problems to suit the technology, rather than develop technology to address the problem. This approach has its limitations, in the sense that its spread will be limited. Moreover, these natural tasks need to be developed by exploiting the linguistic and dialectical constraints of a given region. There are many issues that need to be addressed, starting from speech signal processing to deal with degradations due to environmental factors. Moreover, spoken language is not well structured to develop a model to capture the constraints. Most of the spoken conversations involve ill-formed sentences and gestures, which are difficult to deal with using current NLP techniques. A few of the challenging problems in speech and language processing are: identifying the speech regions in the audio signals (so called voice activity detection), speech enhancement (instead of noise suppression), speaker separation form multispeaker data, emotion recognition in casual speech, improving speech intelligibility, capturing the prosody features in a multilingual casual conversation, context switching, speaker identification. All these tasks require basic understanding of signal processing and speech characteristics. The data driven approaches may not be useful in a practical situation involving several unknown factors.

Q. What challenges do you think our current student population faces as far as preparedness in these areas is concerned?

The current student population is definitely confused, as the academicians, administrators and planners are asking them to look at problems from current technological developments, rather than asking them to understand the problem first.  Currently technology is dictating the goals, rather than goals dictating the technology. We must emphasize that students must be equipped with fundamental principles of speech, signal processing and language constraints, along with basic mathematics and sciences.  I am noticing that most (over 80%) students are being forced to acquire skills, instead of knowledge of the relevant subjects. This will unfortunately increase our dependence on technology and tools. In my opinion, there are enough challenging tasks in speech and NLP which need to be addressed for developing systems useful in a given environment.  That gives a better understanding and capability to develop systems for a new situation.

Q. What are your suggestions to upcoming researchers for doing cutting-edge research and doing on societal impactful work?

I feel that researchers should look for problems that they see around, especially in the area of spoken language processing. That will give them good motivation and satisfaction of learning and accomplishment. It is waste of effort in joining competitions involving tens of thousands of hours of speech from thousands of speakers, or billions of words of text. How naïve we are to think that they can compete with the tech giants like Google, Microsoft, Facebook etc on these tasks. From a researcher point of view, their effort will not yield anything.  In a few cases they may be helping in refining some of the ideas generated by them. But they can never generate any new ideas with the data-driven approaches.  Most of the AI and ML based research is dictated by resources (which many don’t have), and not by any great ideas. I think upcoming researchers should focus on new ideas, as there is hardly a good solution applicable in practice in the area of speech and language processing.  Another wrong notion some researchers may have is that deep learning (DL) approach may be a replacement to signal processing approaches. In fact, they are complementary. Good signal processing approach may significantly reduce the computational complexity of the unexplainable DL-based approach.


IEEE SPS Educational Resources

IEEE SPS Resource Center

IEEE SPS YouTube Channel