TASLP Volume 32 | 2024

2024

<p>TASLP Volume 32 | 2024</p>

https://ieeexplore.ieee.org/xpl/tocresult.jsp?isnumber=10304349&punumber=6570655

Disentangling Prosody Representations With Unsupervised Speech Reconstruction

TASLP Volume 32 | 2024

Human speech can be characterized by different components, including semantic content, speaker identity and prosodic information. Significant progress has been made in disentangling representations for semantic content and speaker identity in speech recognition and speaker verification tasks respectively. However, it is still an open challenging question to extract prosodic information because of the intrinsic association of different attributes, such as timbre and rhythm, and because of the need for supervised training schemes to achieve robust speech recognition.

Speech Dereverberation With Frequency Domain Autoregressive Modeling

TASLPRO Articles

TASLP Volume 32 | 2024

Speech applications in far-field real world settings often deal with signals that are corrupted by reverberation. The task of dereverberation constitutes an important step to improve the audible quality and to reduce the error rates in applications like automatic speech recognition (ASR). We propose a unified framework of speech dereverberation for improving the speech quality and the ASR performance using the approach of envelope-carrier decomposition provided by an autoregressive (AR) model.

Operation-Augmented Numerical Reasoning for Question Answering

TASLPRO Articles

TASLP Volume 32 | 2024

Question answering requiring numerical reasoning, which generally involves symbolic operations such as sorting, counting, and addition, is a challenging task. To address such a problem, existing mixture-of-experts (MoE)-based methods design several specific answer predictors to handle different types of questions and achieve promising performance. However, they ignore the modeling and exploitation of fine-grained reasoning-related operations to support numerical reasoning, encountering the inadequacy in reasoning capability and interpretability.

Statistical Analysis for Speaker Recognition Evaluation With Data Dependence and Three Score Distributions

TASLPRO Articles

TASLP Volume 32 | 2024

The speaker recognition evaluation is conducted in a framework in which three score distributions and two decision thresholds are employed, and the statistic of interest is an average of the two weighted sums of the probabilities of type I and type II errors at the two thresholds correspondingly. And data dependence caused by multiple use of the same subjects exists ubiquitously in order to generate more samples because of limited resources.

Subscribe to TASLP Volume 32 | 2024

Publications & Resources

Conferences & Events

Professional Development

Community & Involvement

About IEEE SPS

For Volunteers

TASLP Volume 32 | 2024

Disentangling Prosody Representations With Unsupervised Speech Reconstruction

Speech Dereverberation With Frequency Domain Autoregressive Modeling

Operation-Augmented Numerical Reasoning for Question Answering

Statistical Analysis for Speaker Recognition Evaluation With Data Dependence and Three Score Distributions

IEEE Signal Processing Society on

Publications & Resources

Conferences & Events

Professional Development

Community & Involvement

About IEEE SPS

For Volunteers