Getting to Know Your Fellow Researchers: Hynek Hermansky

March, 2018

What sparked your interest in speech and language processing?

When I started my professional career, I hesitated between electronic music and speech. However, speech looked less frivolous to me on my application for the Japanese Government fellowship. I got the fellowship and ended up at the University of Tokyo. Ever since, there was no way out of speech. To me, speech is fascinating as amazing means for information exchange among human beings. Even though for the most of my career I was paid for machine recognition of speech, I was lucky to also smuggle in my interests in human speech communication.

How do you think speech and language processing is changing the society for the new generation?

Speech communication keeps changing society since the early days of the telephone. The vision of the Bell System in encouraging basic research, brought many revolutionary scientific advances. The Bell System may be gone but I hope that at least some money being made by the current speech technology will be put to use for similar scientific advances for the new generations.

What is your holy grail in speech and language processing? When will we achieve it?

The holy grail? Understanding what are the principles that allow for easy human communication. This understanding should also lead to new generations of more elegant and therefore also more efficient speech processing systems. I hope that current advances in deep neural net machine learning and big data contribute to achieving this goal. When? Never! I believe that there will always be unexplored corners in speech communication for the new generations to explore.

Do you have any specific advice for students, junior faculty or others early in their careers?

Don’t get pushed into other fellow’s problems. When you can, do you own thing. In my experience, it is better to work in neglected area without much of a competition than working in the crowd of smart and hard-working fad followers. This may be suboptimal strategy in going for fast promotions but it will pay off in a long run. When this does not work for you, you may always try to get rich joining the Wall Street :-).

What development in the field has most surprised you? Was there a hard problem that turned out to be easy? An easy problem that proved surprisingly difficult?

“A hard problem”? The Amazon Echo – the first time I heard this was what Amazon was after, I felt somehow sorry for my students who then joined the Amazon team. However, I guess it was not easy for them. “Easy problem which is difficult to solve”? How come we as human beings can so easily communicate by speech? I am sure that any of us can easily come with examples where any four year old still easily beats a machine. How do they know when they do not know. How do they learn from a simple example of a word spoken by their mother and still can easily understand the uncle who showed in their life for the first time?

Oftentimes, the work that people get noticed for is not the same as the work which they find most exciting/rewarding/interesting. Which of your publications is your favorite? Why?

I hate to read my own papers after they are published but I am happy to see that some of our early and sometimes rudimentary findings are still being used and further developed by others. O.K., since you push me – the works with [Nelson] Morgan on processing of modulation spectra, with Sarel van Vuuren, Naren Malayath and Fabio Valente on data-derived speech representations, with Sangita Sharma and Pratibha Jain on deep learning from long temporal frequency-localized patterns, related works with Sangita and Pratibha joined by Dan Ellis on DNN-based features, with Marios Athineos and Sriram Ganapathy on autoregressive models of Hilbert envelopes in frequency bands, with [Bayya] Yegnanarayana on DNN autoencoder-based modeling of vector spaces, with Sunil Sivadas and Samuel Thomas on multi-genre and multi-lingual DNNs, with Joel Pinto on stacked DNNs, with Sivaram Garimella on DNN-based speaker verification, with Nima Mesgarani, Ehsan Variani, Tesuji Ogawa, and Harish Mallidi on performance monitoring, …Please stop me, aren’t you sorry that you asked?

Publications & Resources

Conferences & Events

Education & Training

Community & Involvement

Career & Industry

About IEEE SPS

For Volunteers

Getting to Know Your Fellow Researchers: Hynek Hermansky

IEEE Signal Processing Society on

Publications & Resources

Conferences & Events

Education & Training

Community & Involvement

About IEEE SPS

For Volunteers

Career & Industry

Education & Training