TASLP Volume 29 | 2021

2021

<p>TASLP Volume 29 | 2021</p>

https://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=6570655

Audio-Aware Spoken Multiple-Choice Question Answering With Pre-Trained Language Models

TASLP Volume 29 | 2021

Spoken multiple-choice question answering (SMCQA) requires machines to select the correct choice to answer the question by referring to the passage, where the passage, the question, and multiple choices are all in the form of speech. While the audio could contain useful cues for SMCQA, usually only the auto-transcribed text is utilized in model development. Thanks to the large-scaled pre-trained language representation models, such as the bidirectional encoder representations from Transformers (BERT), systems with only auto-transcribed text can still achieve a certain level of performance.

Sarcasm Detection with Commonsense Knowledge

TASLPRO Articles

TASLP Volume 29 | 2021

Sarcasm is commonly used in today's social media platforms such as Twitter and Reddit. Sarcasm detection is necessary for analysing people's real sentiments as people usually use sarcasm to express a flipped emotion against the literal meaning. However, the current works neglect the fact that commonsense knowledge is crucial for sarcasm recognition.

Keyword Search Using Attention-Based End-to-End ASR and Frame-Synchronous Phoneme Alignments

TASLPRO Articles

TASLP Volume 29 | 2021

Attention-based end-to-end (E2E) automatic speech recognition (ASR) architectures are now the state-of-the-art in terms of recognition performance. However, despite their effectiveness, they have not been widely applied in keyword search (KWS) tasks yet. In this paper, we propose the Att-E2E-KWS architecture, an attention-based E2E ASR framework for KWS that can afford accurate and reliable keyword retrieval results.

Audio-Visual Multi-Channel Integration and Recognition of Overlapped Speech

TASLPRO Articles

TASLP Volume 29 | 2021

Automatic speech recognition (ASR) technologies have been significantly advanced in the past few decades. However, recognition of overlapped speech remains a highly challenging task to date. To this end, multi-channel microphone array data are widely used in current ASR systems.

Conditioned Source Separation for Musical Instrument Performances

TASLPRO Articles

TASLP Volume 29 | 2021

In music source separation, the number of sources may vary for each piece and some of the sources may belong to the same family of instruments, thus sharing timbral characteristics and making the sources more correlated. This leads to additional challenges in the source separation problem.

Bayesian Learning for Deep Neural Network Adaptation

TASLPRO Articles

TASLP Volume 29 | 2021

A key task for speech recognition systems is to reduce the mismatch between training and evaluation data that is often attributable to speaker differences. Speaker adaptation techniques play a vital role to reduce the mismatch. Model-based speaker adaptation approaches often require sufficient amounts of target speaker data to ensure robustness.

Modified Magnitude-Phase Spectrum Information for Spoofing Detection

TASLPRO Articles

TASLP Volume 29 | 2021

Most of the existing feature representations for spoofing countermeasures consider information either from the magnitude or phase spectrum. We hypothesize that both magnitude and phase spectra can be beneficial for spoofing detection (SD) when collectively used to capture the signal artifacts. In this work, we propose a novel feature referred to as modified magnitude-phase spectrum (MMPS) to capture both magnitude and phase information from the speech signal.

Audio-Visual Deep Neural Network for Robust Person Verification

TASLPRO Articles

TASLP Volume 29 | 2021

Voice and face are two most popular biometrics for person verification, usually used in speaker verification and face verification tasks. It has already been observed that simply combining the information from these two modalities can lead to a more powerful and robust person verification system.

Passive Geometry Calibration for Microphone Arrays Based on Distributed Damped Newton Optimization

TASLPRO Articles

TASLP Volume 29 | 2021

Geometry calibration is an inherent challenge in distributed acoustic sensor networks. To mitigate this problem, a passive geometry calibration approach based on distributed damped Newton optimization is proposed. Specifically, a geometric cost function incorporating direction of arrivals (DoAs) and time difference of arrivals (TDoAs) is first formulated, and then its identifiability conditions are given.

Novel Architectures for Unsupervised Information Bottleneck Based Speaker Diarization of Meetings

TASLPRO Articles

TASLP Volume 29 | 2021

Speaker diarization is an important problem that is topical, and is especially useful as a preprocessor for conversational speech related applications. The objective of this article is two-fold: (i) segment initialization by uniformly distributing speaker information across the initial segments, and (ii) incorporating speaker discriminative features within the unsupervised diarization framework. In the first part of the work, a varying length segment initialization technique for Information Bottleneck (IB) based speaker diarization system using phoneme rate as the side information is proposed. This initialization distributes speaker information uniformly across the segments and provides a better starting point for IB based clustering.

Subscribe to TASLP Volume 29 | 2021

Publications & Resources

Conferences & Events

Education & Training

Community & Involvement

Career & Industry

About IEEE SPS

For Volunteers

TASLP Volume 29 | 2021

Audio-Aware Spoken Multiple-Choice Question Answering With Pre-Trained Language Models

Sarcasm Detection with Commonsense Knowledge

Keyword Search Using Attention-Based End-to-End ASR and Frame-Synchronous Phoneme Alignments

Audio-Visual Multi-Channel Integration and Recognition of Overlapped Speech

Conditioned Source Separation for Musical Instrument Performances

Bayesian Learning for Deep Neural Network Adaptation

Modified Magnitude-Phase Spectrum Information for Spoofing Detection

Audio-Visual Deep Neural Network for Robust Person Verification

Passive Geometry Calibration for Microphone Arrays Based on Distributed Damped Newton Optimization

Novel Architectures for Unsupervised Information Bottleneck Based Speaker Diarization of Meetings

IEEE Signal Processing Society on

Publications & Resources

Conferences & Events

Education & Training

Community & Involvement

About IEEE SPS

For Volunteers

Career & Industry

Education & Training