Farfield speech recognition has become a popular research area in the past few years, from more research focused activities such as the CHiME Challenges, to the launches of Amazon Echo and Google Home. This talk will describe the research efforts around Google Home. Most multichannel ASR systems commonly separate speech enhancement, including localization, beamforming, and postfiltering from acoustic modeling. In this talk, we will introduce a framework to do multichannel enhancement jointly with acoustic modeling using deep neural networks. Inspired by beamforming, which leverages differences in the fine time structure of the signal at different microphones to filter energy arriving from different directions, we explore modeling the raw time-domain waveform directly. We introduce a neural network architecture that performs multichannel filtering in the first layer of the network and shows that this network learns to be robust to varying target speaker direction of arrival, performing as well as a model that is given oracle knowledge of the true target speaker direction. Next, we show how performance can be improved by factoring the first layer to separate the multichannel spatial filtering operation from a single-channel filterbank that computes a frequency decomposition. We also introduce an adaptive variant, which updates the spatial filter coefficients at each time frame based on the previous inputs. Finally, we demonstrate that these approaches can be implemented more efficiently in the frequency domain.

DOI

https://dx.doi.org/10.17023/zge2-2k82

Duration

0:52:30

Subtitles

✖

Subscribe to SSP-SPEC

Publications & Resources

Conferences & Events

Education & Training

Community & Involvement

Career & Industry

About IEEE SPS

For Volunteers

SSP-SPEC

Multichannel Raw-Waveform Neural Network Acoustic Models

IEEE Signal Processing Society on

Publications & Resources

Conferences & Events

Education & Training

Community & Involvement

About IEEE SPS

For Volunteers

Career & Industry

Education & Training