What sparked your interest in speech and language processing?
How do you think speech and language processing is changing the society for the new generation?
The last decade has witnessed a massive increase in the use of voice dialog over phone links. Just a few years ago, most organizations required callers to listen to extensive messages and then respond to a series of questions by pushing keypad buttons. This style of interaction still persists and is not user-friendly, and is a source of frustration to many. Hands-free interaction by voice is far more pleasant. With voice inquiry in the home as well becoming more prevalent, the general public is far more comfortable talking to machines than ever before. Lastly, the new generation is far more tuned into use of PDAs than us older folk, who recall life without microwave ovens, remote controls (for cars, doors, TVs), VCR/PVR, MP3 players, PCs and push-button phones. I sometimes wonder how we passed the time back then.
What is your holy grail in speech and language processing? When will we achieve it?
Do you have any specific advice for students, junior faculty or others early in their careers?
- I resisted hidden Markov models (HMMs) when they first appeared in the mid-1970s, and so I was somewhat surprised to see them replace LPC/DTW as the standard for ASR for the last 30 years, notwithstanding the inherent flaws of the first-order model assumption and exponential durational density. Of course, their exploitation of statistics overcame the fatal flaw of the deterministic and time-consuming DTW.
- I always thought that GMMs (Gaussian mixture models) were a strange way to model the complex densities one finds when modeling diverse conditions (e.g., Gaussians are great for indentically-distributed RVs, but not for handling speaker independence).
- MFCCs made sense in terms of exploiting critical-band aspects of audition, but their final step of inverting the Fourier transform back to the time domain made less sense to me, as human audition does nothing of the sort and any physical interpretation beyond the first two coefficients was impossible.
Any problem seems hard before one solves it, and then seems trivially easy afterward…
Which of your publications is your favorite? Why?