TASLP Articles

You are here

Top Reasons to Join SPS Today!

1. IEEE Signal Processing Magazine
2. Signal Processing Digital Library*
3. Inside Signal Processing Newsletter
4. SPS Resource Center
5. Career advancement & recognition
6. Discounts on conferences and publications
7. Professional networking
8. Communities for students, young professionals, and women
9. Volunteer opportunities
10. Coming soon! PDH/CEU credits
Click here to learn more.

TASLP Articles

Deep neural networks (DNNs) represent the mainstream methodology for supervised speech enhancement, primarily due to their capability to model complex functions using hierarchical representations. However, a recent study revealed that DNNs trained on a single corpus fail to generalize to untrained corpora, especially in low signal-to-noise ratio (SNR) conditions.

In many real-world settings, machine learning models need to identify user inputs that are out-of-domain (OOD) so as to avoid performing wrong actions. This work focuses on a challenging case of OOD detection, where no labels for in-domain data are accessible (e.g., no intent labels for the intent classification task).

The perception of one’s own voice influences the acceptance of hearing devices, such as headphones, headsets or hearing aids. When these devices fully or partially occlude the ear canal, the wearer’s own voice sounds boomy or like talking in a barrel. This is called occlusion effect . Occluding the ear canal results in an amplification of body-conducted sounds, mainly at low frequencies, and an attenuation of air-conducted sounds, predominantly at high frequencies, compared to the open ear. 

Transcribing structural data into readable text (data-to-text) is a fundamental language generation task. One of its challenges is to plan the input records for text realization. Recent works tackle this problem with a static planner, which performs record planning in advance for text realization. However, they cannot revise plans to cope with unexpected realized text and require golden plans for supervised training. To address these issues, we first propose a model that contains a dynamic planner.

We present a scalable and efficient neural waveform coding system for speech compression. We formulate the speech coding problem as an autoencoding task, where a convolutional neural network (CNN) performs encoding and decoding as a neural waveform codec (NWC) during its feedforward routine. The proposed NWC also defines quantization and entropy coding as a trainable module, so the coding artifacts and bitrate control are handled during the optimization process.

Automatically solving math word problems is a critical task in the field of natural language processing. Recent models have reached their performance bottleneck and require more high-quality data for training. We propose a novel data augmentation method that reverses the mathematical logic of math word problems to produce new high-quality math problems and introduce new knowledge points that can benefit learning the mathematical reasoning logic. 

Spoken multiple-choice question answering (SMCQA) requires machines to select the correct choice to answer the question by referring to the passage, where the passage, the question, and multiple choices are all in the form of speech. While the audio could contain useful cues for SMCQA, usually only the auto-transcribed text is utilized in model development. Thanks to the large-scaled pre-trained language representation models, such as the bidirectional encoder representations from Transformers (BERT), systems with only auto-transcribed text can still achieve a certain level of performance. 

Sarcasm is commonly used in today's social media platforms such as Twitter and Reddit. Sarcasm detection is necessary for analysing people's real sentiments as people usually use sarcasm to express a flipped emotion against the literal meaning. However, the current works neglect the fact that commonsense knowledge is crucial for sarcasm recognition.

Attention-based end-to-end (E2E) automatic speech recognition (ASR) architectures are now the state-of-the-art in terms of recognition performance. However, despite their effectiveness, they have not been widely applied in keyword search (KWS) tasks yet. In this paper, we propose the Att-E2E-KWS architecture, an attention-based E2E ASR framework for KWS that can afford accurate and reliable keyword retrieval results. 

Automatic speech recognition (ASR) technologies have been significantly advanced in the past few decades. However, recognition of overlapped speech remains a highly challenging task to date. To this end, multi-channel microphone array data are widely used in current ASR systems.

Pages

SPS on Twitter

  • NEW SPS WEBINAR: On Tuesday, 13 December, join Dr. Qian Huang for "Deep Learning for All-in-Focus Imaging" - regist… https://t.co/4AVCabulyP
  • Join the SPS Membership Drive on Monday, 12 December, when SPS members, potential members, and the greater signal p… https://t.co/gtbisawJIK
  • The fundraising deadline to meet our 30 unique donations of US$10 or more is tonight — increase your impact for sig… https://t.co/KTzzCKnEMO
  • Happy ! Celebrate this global day of generosity and community action with the IEEE Foundation and… https://t.co/UvaytMFnQ1
  • The SPS Biomedical Imaging and Signal Processing Technical Committee Webinar Series continues on Tuesday, 6 Decembe… https://t.co/SYEEzoxIAK

SPS Videos


Signal Processing in Home Assistants

 


Multimedia Forensics


Careers in Signal Processing             

 


Under the Radar