1. IEEE Signal Processing Magazine
2. Signal Processing Digital Library*
3. Inside Signal Processing Newsletter
4. SPS Resource Center
5. Career advancement & recognition
6. Discounts on conferences and publications
7. Professional networking
8. Communities for students, young professionals, and women
9. Volunteer opportunities
10. Coming soon! PDH/CEU credits
Click here to learn more.
10 years of news and resources for members of the IEEE Signal Processing Society
Xuedong Huang is a Microsoft Technical Fellow and Azure AI Chief Technology Officer. He is responsible for Microsoft’s Azure AI engineering and research to bring the dream of making machines see, hear and understand human beings a reality.
He joined Microsoft to found the company’s speech technology group in 1993. He helped bring speech technology to the mass market by introducing Windows SAPI in 1995, Speech Server in 2004, and Azure Speech in 2015. He has held a variety of responsibilities in Research, Incubation, and Production to advance Microsoft’s AI stack from deep learning infrastructure to enabling new experiences. He helped Microsoft achieve multiple historical AI milestones on the open research tasks, including a human parity milestone in conversational speech recognition in 2016; a human parity milestone in machine translation in 2018; and a human parity milestone in image captioning in 2020.
1. Your bio in your own words.
I am a Technical Fellow and CTO of Azure AI and am really excited to be driving the research and engineering for mass market AI with Azure AI as showcased with Azure Cognitive Services.
After I completed my graduate studies, I have been working on spoken language processing since 1982. I founded the spoken language effort when Microsoft started its research program and introduced the first industry wide speech API services in Windows 95. Precisely 20 years after that, Project Oxford was introduced in 2015 and would go on to become the product we know today as Azure Cognitive Services. Speech APIs moved from Windows to Azure reflecting the journey of Microsoft in the digital transformation journey. It's been fantastic to experience the journey as we combine technology, data and services to delight customers, whether for accessibility or to bridge the language gap. The paradigm has shifted multiple times and I have been fortunate to be a part of the transformation to broaden the use of intelligence to serve the broader public.
2. What challenges you had to face to get where you are today?
Building a team that is world class is the number one challenge. To assemble a group of world class talent that is passionate about the sense of purpose we have, competent with the underlying AI technology and very practical to have the skill to deliver production services to delight our world-wide customers. Not just pursuing science, but to also bring that science in the form of product and services that can be used by any developer. Once you have the team, you need to identify a strategic direction to motivate them to achieve amazing goals.
3. What was the most important factor in your success?
Grit. The work we're doing is never easy. You must have the perseverance to influence as most people may not initially be on agreement and this is particularly true for new and innovative concepts. The ability to influence and bring people along is the most important factor to success.
4. How does your work affect society?
There are 7000 languages spoken on this planet. Many of them will be gone by the end of the century. When a language disappears, the community and cultural heritage will also be gone. My work can preserve these languages and cultures. My work can also help everyone on the planet to communicate better with less language barriers. To make mankind and the planet a better place to work and live. The latest example of how we can bring people closer together through communication is our work with the European Parliament. We are working with them to enable real-time speech translation of members into all 24 official languages so they can communicate freely. Another example is SeeingAI, an accessibility app that helps people who are vision impaired to understand what is happening around them by using our latest computer vision technology. These are two recent examples, but Microsoft has been on a journey for over 20 years to provide technology that has positive affect on society.
5. If there is one take home message you want the readers of this interview have what would it be?
Spoken language processing is the crown jewel of human intelligence. The ability to help people to communicate better with spoken language has a lasting impact. Not only to science and AI, but most importantly, to people’s lives. It’s important to remember how meaningful the work can be, I’ve never forgotten this since the day I started doing work in this field.
6. Failures are an inevitable part of everyone’s career journey, what is the most important lesson you learned during your career when dealing with failures?
I’ve learned to adapt. I love Charles Darwin’s work in the science of evolution. Darwin said, “It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that that is the most adaptable to change.” We must continuously adapt and keep up with the latest. That is the most important lesson.
7. Although novelty and innovation is the most important factor for technology advancement, when a researcher, scientist or engineer has a new idea there are a lot of push backs until they prove the new idea actually works. What is your advice on how to handle them? Especially for the readers who are in early stages of their career.
As I said, grit is the most important factor in anyone’s success. In addition the grit, we all need to nurture a growth mindset. See Dweck’s Mindset: How You Can Fulfill Your Potential. It’s about being open minded and treating everything as a learning opportunity – going beyond the surface and nurturing the curiosity to go deeper. This is when discovery, breakthroughs and magic happen.
8. Anything else that you would like to add?
Spoken language researchers and engineers have made a lot of progress in the accuracy of AI models. There’s a lot of commonalities across different modalities and I expect we will experience similar breakthroughs in computer vision. Creating AI that is more like human intelligence is a key to future advancements. A recent neuroscience discovery supports multi-modal approach to models. It had previously been thought and copiously published that it is ‘pattern separation’ in the hippocampus, an area of the brain critical for memory, that enables memories to be stored by separate groups of neurons, so that memories don’t get mixed up. But just last year, Professor Quiroga of Leicester University, found that there is no pattern separation in the human hippocampus. This is a key difference between human and animal intelligence. I shared in a blog post how Microsoft is using this integrative approach to create more human-like AI. A holistic representation toward integrative AI - Microsoft Research
To learn more about Xuedong Huang, visit his webpage.
© Copyright 2022 IEEE – All rights reserved. Use of this website signifies your agreement to the IEEE Terms and Conditions.
A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity.