IEEE/ACM Transactions on Audio, Speech, and Language Processing

In music source separation, the number of sources may vary for each piece and some of the sources may belong to the same family of instruments, thus sharing timbral characteristics and making the sources more correlated. This leads to additional challenges in the source separation problem.

Bayesian Learning for Deep Neural Network Adaptation

TASLP Volume 29 | 2021

TASLP Articles

A key task for speech recognition systems is to reduce the mismatch between training and evaluation data that is often attributable to speaker differences. Speaker adaptation techniques play a vital role to reduce the mismatch. Model-based speaker adaptation approaches often require sufficient amounts of target speaker data to ensure robustness.

Modified Magnitude-Phase Spectrum Information for Spoofing Detection

TASLP Volume 29 | 2021

TASLP Articles

Most of the existing feature representations for spoofing countermeasures consider information either from the magnitude or phase spectrum. We hypothesize that both magnitude and phase spectra can be beneficial for spoofing detection (SD) when collectively used to capture the signal artifacts. In this work, we propose a novel feature referred to as modified magnitude-phase spectrum (MMPS) to capture both magnitude and phase information from the speech signal.

Audio-Visual Deep Neural Network for Robust Person Verification

TASLP Volume 29 | 2021

TASLP Articles

Voice and face are two most popular biometrics for person verification, usually used in speaker verification and face verification tasks. It has already been observed that simply combining the information from these two modalities can lead to a more powerful and robust person verification system.

Passive Geometry Calibration for Microphone Arrays Based on Distributed Damped Newton Optimization

TASLP Volume 29 | 2021

TASLP Articles

Geometry calibration is an inherent challenge in distributed acoustic sensor networks. To mitigate this problem, a passive geometry calibration approach based on distributed damped Newton optimization is proposed. Specifically, a geometric cost function incorporating direction of arrivals (DoAs) and time difference of arrivals (TDoAs) is first formulated, and then its identifiability conditions are given.

Novel Architectures for Unsupervised Information Bottleneck Based Speaker Diarization of Meetings

TASLP Volume 29 | 2021

TASLP Articles

Speaker diarization is an important problem that is topical, and is especially useful as a preprocessor for conversational speech related applications. The objective of this article is two-fold: (i) segment initialization by uniformly distributing speaker information across the initial segments, and (ii) incorporating speaker discriminative features within the unsupervised diarization framework. In the first part of the work, a varying length segment initialization technique for Information Bottleneck (IB) based speaker diarization system using phoneme rate as the side information is proposed. This initialization distributes speaker information uniformly across the segments and provides a better starting point for IB based clustering.

Hierarchical Regulated Iterative Network for Joint Task of Music Detection and Music Relative Loudness Estimation

TASLP Volume 29 | 2021

TASLP Articles

One practical requirement of the music copyright management is the estimation of music relative loudness, which is mostly ignored in existing music detection works. To solve this problem, we study the joint task of music detection and music relative loudness estimation. To be specific, we observe that the joint task has two characteristics, i.e., temporality and hierarchy, which could facilitate to obtain the solution. For example, a tiny fragment of audio is temporally related to its neighbor fragments because they may all belong to the same event, and the event classes of the fragment in the two tasks have a hierarchical relationship. Based on the above observation, we reformulate the joint task as hierarchical event detection and localization problem. To solve this problem, we further propose Hierarchical Regulated Iterative Networks (HRIN), which includes two variants, termed as HRIN-r and HRIN-cr, which are based on recurrent and convolutional recurrent modules.

SOLVIT: A Reference-Free Source Localization Technique Using Majorization Minimization

TASLP Volume 28 | 2020

TASLP Articles

We consider the problem of localizing the source using range, and range-difference measurements. Both the problems are non-convex, and non-smooth, and are challenging to solve. In this article, we develop an iterative algorithm - Source Localization Via an Iterative technique (SOLVIT) to localize the source using all the distinct range-difference measurements, i.e., without choosing a reference sensor.

Personal Sound Zones by Subband Filtering and Time Domain Optimization

TASLP Volume 28 | 2020

TASLP Articles

Personal Sound Zones (PSZ) systems aim to render independent sound signals to multiple listeners within a room by using arrays of loudspeakers. One of the algorithms used to provide PSZ is Weighted Pressure Matching (wPM), which computes the filters required to render a desired response in the listening zones while reducing the acoustic energy arriving to the quiet zones.

Worst-Case-Optimization Robust-MVDR Beamformer for Stereo Noise Reduction in Hearing Aids

TASLP Volume 28 | 2020

TASLP Articles

This paper presents a robust beamformer for stereo noise reduction in hearing aid applications. The worst-case optimization method was applied to the binaural minimum-variance distortionless-response (BMVDR) beamformer, for providing robustness against parameter estimation inaccuracies.

webinar_1.jpg

SPS Webinar: Recent Advances and Challenges of Deepfake Detection

Justin_Dauwels.jpg

Distinguished Lecture: Prof. Dr. Justin Dauwels (TU Delft)

Justin_Dauwels.jpg

Distinguished Lecture: Prof. Dr. Justin Dauwels (TU Delft)

What is Signal Processing?

Popular Pages

Today's:

All time:

Last viewed:

IEEE/ACM Transactions on Audio, Speech, and Language Processing

Top Reasons to Join SPS Today!

Conditioned Source Separation for Musical Instrument Performances

Bayesian Learning for Deep Neural Network Adaptation

Modified Magnitude-Phase Spectrum Information for Spoofing Detection

Audio-Visual Deep Neural Network for Robust Person Verification

Passive Geometry Calibration for Microphone Arrays Based on Distributed Damped Newton Optimization

Novel Architectures for Unsupervised Information Bottleneck Based Speaker Diarization of Meetings

Hierarchical Regulated Iterative Network for Joint Task of Music Detection and Music Relative Loudness Estimation

SOLVIT: A Reference-Free Source Localization Technique Using Majorization Minimization

Personal Sound Zones by Subband Filtering and Time Domain Optimization

Worst-Case-Optimization Robust-MVDR Beamformer for Stereo Noise Reduction in Hearing Aids

Pages

SPS Social Media

IEEE SPS Educational Resources

What is Signal Processing?

Popular Pages

Today's:

All time:

Last viewed:

IEEE/ACM Transactions on Audio, Speech, and Language Processing

Search form

You are here

Top Reasons to Join SPS Today!

Pages

SPS Social Media

IEEE SPS Educational Resources