Deep Learning-Based Non-Intrusive Multi-Objective Speech Assessment Model With Cross-Domain Features

Top Reasons to Join SPS Today!

1. IEEE Signal Processing Magazine
2. Signal Processing Digital Library*
3. Inside Signal Processing Newsletter
4. SPS Resource Center
5. Career advancement & recognition
6. Discounts on conferences and publications
7. Professional networking
8. Communities for students, young professionals, and women
9. Volunteer opportunities
10. Coming soon! PDH/CEU credits
Click here to learn more.

TASLP Volume 31 | 2023

Deep Learning-Based Non-Intrusive Multi-Objective Speech Assessment Model With Cross-Domain Features

TASLPRO Articles

By:

Ryandhimas E. Zezario; Szu-Wei Fu; Fei Chen; Chiou-Shann Fuh; Hsin-Min Wang; Yu Tsao

This study proposes a cross-domain multi-objective speech assessment model, called MOSA-Net, which can simultaneously estimate the speech quality, intelligibility, and distortion assessment scores of an input speech signal. MOSA-Net comprises a convolutional neural network and bidirectional long short-term memory architecture for representation extraction, and a multiplicative attention layer and a fully connected layer for each assessment metric prediction. Additionally, cross-domain features (spectral and time-domain features) and latent representations from self-supervised learned (SSL) models are used as inputs to combine rich acoustic information to obtain more accurate assessments. Experimental results show that in both seen and unseen noise environments, MOSA-Net can improve the linear correlation coefficient (LCC) scores in perceptual evaluation of speech quality (PESQ) prediction, compared to Quality-Net, an existing single-task model for PESQ prediction, and improve LCC scores in short-time objective intelligibility (STOI) prediction, compared to STOI-Net, an existing single-task model for STOI prediction. Moreover, MOSA-Net can be used as a pre-trained model to be effectively adapted to an assessment model for predicting subjective quality and intelligibility scores with a limited amount of training data. Experimental results show that MOSA-Net can improve LCC scores in mean opinion score (MOS) predictions, compared to MOS-SSL, a strong single-task model for MOS prediction. We further adopt the latent representations of MOSA-Net to guide the speech enhancement (SE) process and derive a quality-intelligibility (QI)-aware SE (QIA-SE) approach. Experimental results show that QIA-SE outperforms the baseline SE system with improved PESQ scores in both seen and unseen noise environments over a baseline SE model.

Read on IEEE Xplore

Tags:

IEEE TASLP Article

SPS Social Media

IEEE SPS Facebook Page https://www.facebook.com/ieeeSPS
IEEE SPS X Page https://x.com/IEEEsps
IEEE SPS Instagram Page https://www.instagram.com/ieeesps/?hl=en
IEEE SPS LinkedIn Page https://www.linkedin.com/company/ieeesps/
IEEE SPS YouTube Channel https://www.youtube.com/ieeeSPS

IEEE SPS Educational Resources

IEEE SPS Resource Center

IEEE SPS YouTube Channel

© Copyright 2025 IEEE - All rights reserved. Use of this website signifies your agreement to the IEEE Terms and Conditions.
A public charity, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity.

ICASSP 2026

Submit Your 2026 ICASSP Workshop Paper

Our Digital Life Podcast Header Banner.png

New Podcast Episode: Trustworthy Machine Learning & Artificial Intelligence

ICASSP 2026

Submit Your 2026 ICASSP Workshop Paper

What is Signal Processing?

Popular Pages

Today's:

All time:

Last viewed:

Deep Learning-Based Non-Intrusive Multi-Objective Speech Assessment Model With Cross-Domain Features

Publications & Resources

For Authors

ICASSP 2026

SP-Magazine-Front_Cover-March-2025.jpg

CAI_2027_Call_for_Proposals.png

Top Reasons to Join SPS Today!

Deep Learning-Based Non-Intrusive Multi-Objective Speech Assessment Model With Cross-Domain Features

SPS Social Media

IEEE SPS Educational Resources

What is Signal Processing?

Popular Pages

Today's:

All time:

Last viewed:

Deep Learning-Based Non-Intrusive Multi-Objective Speech Assessment Model With Cross-Domain Features

Search form

You are here

Publications & Resources

For Authors

Top Reasons to Join SPS Today!

Deep Learning-Based Non-Intrusive Multi-Objective Speech Assessment Model With Cross-Domain Features

SPS Social Media

IEEE SPS Educational Resources