IEEE/ACM Transactions on Audio, Speech, and Language Processing

You are here

Top Reasons to Join SPS Today!

1. IEEE Signal Processing Magazine
2. Signal Processing Digital Library*
3. Inside Signal Processing Newsletter
4. SPS Resource Center
5. Career advancement & recognition
6. Discounts on conferences and publications
7. Professional networking
8. Communities for students, young professionals, and women
9. Volunteer opportunities
10. Coming soon! PDH/CEU credits
Click here to learn more.

One of the biggest challenges in multimicrophone applications is the estimation of the parameters of the signal model, such as the power spectral densities (PSDs) of the sources, the early (relative) acoustic transfer functions of the sources with respect to the microphones, the PSD of late reverberation, and the PSDs of microphone-self noise. 

The transfer of acoustic data across languages has been shown to improve keyword search (KWS) performance in data-scarce settings. In this paper, we propose a way of performing this transfer that reduces the impact of the prevalence of out-of-vocabulary (OOV) terms on KWS in such a setting.

Recently, the binaural auditory-model-based quality prediction (BAM-Q) was successfully applied to predict binaural audio quality degradations, while the generalized power-spectrum model for quality (GPSM q ) has been demonstrated to account for a large variety of monaural signal distortions.

The avoidance of spatial aliasing is a major challenge in the practical implementation of sound field synthesis. Such methods aim at a physically accurate reconstruction of a desired sound field inside a target region using a finite ensemble of loudspeakers. In the past, different theoretical treatises of the inherent spatial sampling process led to anti-aliasing criteria for simple loudspeaker array arrangements, e.g., lines and circles, and fundamental sound fields, e.g., plane and spherical waves. Many criteria were independent of the listener's position inside the target region.

Recently, generative neural network models which operate directly on raw audio, such as WaveNet, have improved the state of the art in text-to-speech synthesis (TTS). Moreover, there is increasing interest in using these models as statistical vocoders for generating speech waveforms from various acoustic features. However, there is also a need to reduce the model complexity, without compromising the synthesis quality.

Multi-channel linear prediction (MCLP) can model the late reverberation in the short-time Fourier transform domain using a delayed linear predictor and the prediction residual is taken as the desired early reflection component. Traditionally, a Gaussian source model with time-dependent precision (inverse of variance) is considered for the desired signal. In this paper, we propose a Student's t-distribution model for the desired signal, which is realized as a Gaussian source with a Gamma distributed precision.

Each edition of the challenge on Detection and Classification of Acoustic Scenes and Events (DCASE) contained several tasks involving sound event detection in different setups. DCASE 2017 presented participants with three such tasks, each having specific datasets and detection requirements: Task 2, in which target sound events were very rare in both training and testing data, Task 3 having overlapping events annotated in real-life audio, and Task 4, in which only weakly labeled data were available for training.

Scope

The IEEE/ACM Transactions on Audio, Speech, and Language Processing is dedicated to innovative theory and methods for processing signals representing audio, speech and language, and their applications. This includes analysis, synthesis, enhancement, transformation, classification and interpretation of such signals as well as the design, development, and evaluation of associated signal processing systems.

Machine learning and pattern analysis applied to any of the above areas is also welcome.

Pages

SPS Social Media

IEEE SPS Educational Resources

IEEE SPS Resource Center

IEEE SPS YouTube Channel