IEEE/ACM Transactions on Audio, Speech, and Language Processing

Previous studies have shown that attention mechanisms and shortest dependency paths have a positive effect on relation classification. In this paper, a keyword-attentive sentence mechanism is proposed to effectively combine the two methods. Furthermore, to effectively handle the imbalanced classification problem, this paper proposes a new loss function called the synthetic stimulation loss , which uses a modulating factor to allow the model to focus on hard-to-classify samples.

AgentGraph: Toward Universal Dialogue Management With Structured Deep Reinforcement Learning

Dialogue policy plays an important role in task-oriented spoken dialogue systems. It determines how to respond to users. The recently proposed deep reinforcement learning (DRL) approaches have been used for policy optimization. However, these deep models are still challenging for two reasons: first, many DRL-based policies are not sample efficient; and second, most models do not have the capability of policy transfer between different domains.

Multichannel Online Dereverberation Based on Spectral Magnitude Inverse Filtering

This paper addresses the problem of multichannel online dereverberation. The proposed method is carried out in the short-time Fourier transform (STFT) domain, and for each frequency band independently. In the STFT domain, the time-domain room impulse response is approximately represented by the convolutive transfer function (CTF).

Methods of Extending a Generalized Sidelobe Canceller With External Microphones

While substantial noise reduction and speech enhancement can be achieved with multiple microphones organized in an array, in some cases, such as when the microphone spacings are quite close, it can also be quite limited. This degradation can, however, be resolved by the introduction of one or more external microphones ( XM s) into the same physical space as the local microphone array ( LMA ).

Time-Contrastive Learning Based Deep Bottleneck Features for Text-Dependent Speaker Verification

There are a number of studies about extraction of bottleneck (BN) features from deep neural networks (DNNs) trained to discriminate speakers, pass-phrases, and triphone states for improving the performance of text-dependent speaker verification (TD-SV). However, a moderate success has been achieved.

Conv-TasNet: Surpassing Ideal Time–Frequency Magnitude Masking for Speech Separation

Single-channel, speaker-independent speech separation methods have recently seen great progress. However, the accuracy, latency, and computational cost of such methods remain insufficient. The majority of the previous methods have formulated the separation problem through the time–frequency representation of the mixed signal, which has several drawbacks, including the decoupling of the phase...

A Multi-Stage Algorithm for Acoustic Physical Model Parameters Estimation

One of the challenges in computational acoustics is the identification of models that can simulate and predict the physical behavior of a system generating an acoustic signal. Whenever such models are used for commercial applications, an additional constraint is the time to market, making automation of the sound design process desirable.

Constrained Learned Feature Extraction for Acoustic Scene Classification

Deep neural networks (DNNs) have been proven to be powerful models for acoustic scene classification tasks. State-of-the-art DNNs have millions of connections and are computationally intensive, making them difficult to deploy on systems with limited resources.

Tailoring an Interpretable Neural Language Model

TASLP Volume 27 Issue 7