TASLP Articles

You are here

Top Reasons to Join SPS Today!

1. IEEE Signal Processing Magazine
2. Signal Processing Digital Library*
3. Inside Signal Processing Newsletter
4. SPS Resource Center
5. Career advancement & recognition
6. Discounts on conferences and publications
7. Professional networking
8. Communities for students, young professionals, and women
9. Volunteer opportunities
10. Coming soon! PDH/CEU credits
Click here to learn more.

TASLP Articles

Deep neural networks (DNNs) represent the mainstream methodology for supervised speech enhancement, primarily due to their capability to model complex functions using hierarchical representations. However, a recent study revealed that DNNs trained on a single corpus fail to generalize to untrained corpora, especially in low signal-to-noise ratio (SNR) conditions.

In many real-world settings, machine learning models need to identify user inputs that are out-of-domain (OOD) so as to avoid performing wrong actions. This work focuses on a challenging case of OOD detection, where no labels for in-domain data are accessible (e.g., no intent labels for the intent classification task).

The perception of one’s own voice influences the acceptance of hearing devices, such as headphones, headsets or hearing aids. When these devices fully or partially occlude the ear canal, the wearer’s own voice sounds boomy or like talking in a barrel. This is called occlusion effect . Occluding the ear canal results in an amplification of body-conducted sounds, mainly at low frequencies, and an attenuation of air-conducted sounds, predominantly at high frequencies, compared to the open ear. 

Transcribing structural data into readable text (data-to-text) is a fundamental language generation task. One of its challenges is to plan the input records for text realization. Recent works tackle this problem with a static planner, which performs record planning in advance for text realization. However, they cannot revise plans to cope with unexpected realized text and require golden plans for supervised training. To address these issues, we first propose a model that contains a dynamic planner.

We present a scalable and efficient neural waveform coding system for speech compression. We formulate the speech coding problem as an autoencoding task, where a convolutional neural network (CNN) performs encoding and decoding as a neural waveform codec (NWC) during its feedforward routine. The proposed NWC also defines quantization and entropy coding as a trainable module, so the coding artifacts and bitrate control are handled during the optimization process.

Automatically solving math word problems is a critical task in the field of natural language processing. Recent models have reached their performance bottleneck and require more high-quality data for training. We propose a novel data augmentation method that reverses the mathematical logic of math word problems to produce new high-quality math problems and introduce new knowledge points that can benefit learning the mathematical reasoning logic. 

Spoken multiple-choice question answering (SMCQA) requires machines to select the correct choice to answer the question by referring to the passage, where the passage, the question, and multiple choices are all in the form of speech. While the audio could contain useful cues for SMCQA, usually only the auto-transcribed text is utilized in model development. Thanks to the large-scaled pre-trained language representation models, such as the bidirectional encoder representations from Transformers (BERT), systems with only auto-transcribed text can still achieve a certain level of performance. 

Sarcasm is commonly used in today's social media platforms such as Twitter and Reddit. Sarcasm detection is necessary for analysing people's real sentiments as people usually use sarcasm to express a flipped emotion against the literal meaning. However, the current works neglect the fact that commonsense knowledge is crucial for sarcasm recognition.

Attention-based end-to-end (E2E) automatic speech recognition (ASR) architectures are now the state-of-the-art in terms of recognition performance. However, despite their effectiveness, they have not been widely applied in keyword search (KWS) tasks yet. In this paper, we propose the Att-E2E-KWS architecture, an attention-based E2E ASR framework for KWS that can afford accurate and reliable keyword retrieval results. 

Automatic speech recognition (ASR) technologies have been significantly advanced in the past few decades. However, recognition of overlapped speech remains a highly challenging task to date. To this end, multi-channel microphone array data are widely used in current ASR systems.


SPS on Twitter

  • RT : Call for Short Course proposals! in collaboration with the Education Board is planning education… https://t.co/N97XTEgIg8
  • This Wednesday, join the Information Forensics and Security Technical Committee Webinar Series when Dr. Richard Heu… https://t.co/ORdtuq5SlQ
  • Our Biomedical Imaging and Signal Processing Webinar Series continues on Tuesday, 5 July when Michael Unser present… https://t.co/7bYh8ZPHI0
  • Join us TODAY at 11:00 AM ET when the Brain Space Initiative Talk Series continues with Dr. Tianming Liu presenting… https://t.co/MEfnzk6dAE
  • Our 75th anniversary is approaching in 2023, and we're celebrating with a Special Issue of IEEE Signal Processing M… https://t.co/U6UNv8kLSO

SPS Videos

Signal Processing in Home Assistants


Multimedia Forensics

Careers in Signal Processing             


Under the Radar