Audio-Aware Spoken Multiple-Choice Question Answering With Pre-Trained Language Models

Top Reasons to Join SPS Today!

1. IEEE Signal Processing Magazine
2. Signal Processing Digital Library*
3. Inside Signal Processing Newsletter
4. SPS Resource Center
5. Career advancement & recognition
6. Discounts on conferences and publications
7. Professional networking
8. Communities for students, young professionals, and women
9. Volunteer opportunities
10. Coming soon! PDH/CEU credits
Click here to learn more.

TASLP Volume 29 | 2021

Audio-Aware Spoken Multiple-Choice Question Answering With Pre-Trained Language Models

TASLPRO Articles

By:

Chia-Chih Kuo; Kuan-Yu Chen; Shang-Bao Luo

Spoken multiple-choice question answering (SMCQA) requires machines to select the correct choice to answer the question by referring to the passage, where the passage, the question, and multiple choices are all in the form of speech. While the audio could contain useful cues for SMCQA, usually only the auto-transcribed text is utilized in model development. Thanks to the large-scaled pre-trained language representation models, such as the bidirectional encoder representations from Transformers (BERT), systems with only auto-transcribed text can still achieve a certain level of performance. However, previous studies have evidenced that acoustic-level statistics can offset text inaccuracies caused by the automatic speech recognition systems or representation inadequacy lurking in word embedding generators, thereby making the SMCQA system robust. Along the line of research, in this study, an audio-aware SMCQA framework is proposed. Two different mechanisms are introduced to distill the useful cues from speech, and then a BERT-based SMCQA framework is presented. In other words, the proposed SMCQA framework not only inherits the advantages of contextualized language representations learned by BERT but integrates the complementary acoustic-level information distilled from audio with the text-level information. A series of experiments demonstrates remarkable improvements in accuracy over selected baselines and SOTA systems on a published Chinese SMCQA dataset.

Read on IEEE Xplore

Tags:

IEEE TASLP Article

SPS Social Media

IEEE SPS Facebook Page https://www.facebook.com/ieeeSPS
IEEE SPS X Page https://x.com/IEEEsps
IEEE SPS Instagram Page https://www.instagram.com/ieeesps/?hl=en
IEEE SPS LinkedIn Page https://www.linkedin.com/company/ieeesps/
IEEE SPS YouTube Channel https://www.youtube.com/ieeeSPS

IEEE SPS Educational Resources

IEEE SPS Resource Center

IEEE SPS YouTube Channel

© Copyright 2025 IEEE - All rights reserved. Use of this website signifies your agreement to the IEEE Terms and Conditions.
A public charity, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity.

IEEE_SPS_BMSIT_1898x1278.jpg

2025 IEEE SPS Chapter Initiative: SPECTRE (Signal Processing Expo and Creatathon for Technical Research and Excellence)

IEEE_SPS_BMSIT_1898x1278.jpg

2025 IEEE SPS Chapter Initiative: S.P.A.R.C. – Signal Processing, AI & Real-time Computing Camp

iris_1920x1314.png

2025 IEEE SPS Chapter Initiative: IRIS

What is Signal Processing?

Popular Pages

Today's:

All time:

Last viewed:

Audio-Aware Spoken Multiple-Choice Question Answering With Pre-Trained Language Models

Publications & Resources

For Authors

IEEE SPS Conference Call for Proposals

spotlight_general.jpg

cf .png

Top Reasons to Join SPS Today!

Audio-Aware Spoken Multiple-Choice Question Answering With Pre-Trained Language Models

SPS Social Media

IEEE SPS Educational Resources

What is Signal Processing?

Popular Pages

Today's:

All time:

Last viewed:

Audio-Aware Spoken Multiple-Choice Question Answering With Pre-Trained Language Models

Search form

You are here

Publications & Resources

For Authors

Top Reasons to Join SPS Today!

Audio-Aware Spoken Multiple-Choice Question Answering With Pre-Trained Language Models

SPS Social Media

IEEE SPS Educational Resources