Skip to main content

TASLPRO Articles

Occlusion Effect Cancellation in Headphones and Hearing Devices—The Sister of Active Noise Cancellation

The perception of one’s own voice influences the acceptance of hearing devices, such as headphones, headsets or hearing aids. When these devices fully or partially occlude the ear canal, the wearer’s own voice sounds boomy or like talking in a barrel. This is called occlusion effect . Occluding the ear canal results in an amplification of body-conducted sounds, mainly at low frequencies, and an attenuation of air-conducted sounds, predominantly at high frequencies, compared to the open ear. 

Read more

Text Generation From Data With Dynamic Planning

Transcribing structural data into readable text (data-to-text) is a fundamental language generation task. One of its challenges is to plan the input records for text realization. Recent works tackle this problem with a static planner, which performs record planning in advance for text realization. However, they cannot revise plans to cope with unexpected realized text and require golden plans for supervised training. To address these issues, we first propose a model that contains a dynamic planner.

Read more

Scalable and Efficient Neural Speech Coding: A Hybrid Design

We present a scalable and efficient neural waveform coding system for speech compression. We formulate the speech coding problem as an autoencoding task, where a convolutional neural network (CNN) performs encoding and decoding as a neural waveform codec (NWC) during its feedforward routine. The proposed NWC also defines quantization and entropy coding as a trainable module, so the coding artifacts and bitrate control are handled during the optimization process.

Read more

RODA: Reverse Operation Based Data Augmentation for Solving Math Word Problems

Automatically solving math word problems is a critical task in the field of natural language processing. Recent models have reached their performance bottleneck and require more high-quality data for training. We propose a novel data augmentation method that reverses the mathematical logic of math word problems to produce new high-quality math problems and introduce new knowledge points that can benefit learning the mathematical reasoning logic. 

Read more

Audio-Aware Spoken Multiple-Choice Question Answering With Pre-Trained Language Models

Spoken multiple-choice question answering (SMCQA) requires machines to select the correct choice to answer the question by referring to the passage, where the passage, the question, and multiple choices are all in the form of speech. While the audio could contain useful cues for SMCQA, usually only the auto-transcribed text is utilized in model development. Thanks to the large-scaled pre-trained language representation models, such as the bidirectional encoder representations from Transformers (BERT), systems with only auto-transcribed text can still achieve a certain level of performance. 

Read more

Keyword Search Using Attention-Based End-to-End ASR and Frame-Synchronous Phoneme Alignments

Attention-based end-to-end (E2E) automatic speech recognition (ASR) architectures are now the state-of-the-art in terms of recognition performance. However, despite their effectiveness, they have not been widely applied in keyword search (KWS) tasks yet. In this paper, we propose the Att-E2E-KWS architecture, an attention-based E2E ASR framework for KWS that can afford accurate and reliable keyword retrieval results. 

Read more