Skip to main content

Multichannel Raw-Waveform Neural Network Acoustic Models

SHARE:
Pricing

SPS Members $0.00
IEEE Members $11.00
Non-members $15.00

Authors
Date
Farfield speech recognition has become a popular research area in the past few years, from more research focused activities such as the CHiME Challenges, to the launches of Amazon Echo and Google Home. This talk will describe the research efforts around Google Home. Most multichannel ASR systems commonly separate speech enhancement, including localization, beamforming and postfiltering, from acoustic modeling. In this talk, we will introduce a framework to do multichannel enhancement jointly with acoustic modeling using deep neural networks. Inspired by beamforming, which leverages differences in the fine time structure of the signal at different microphones to filter energy arriving from different directions, we explore modeling the raw time-domain waveform directly. We introduce a neural network architecture which performs multichannel filtering in the first layer of the network and show that this network learns to be robust to varying target speaker direction of arrival, performing as well as a model that is given oracle knowledge of the true target speaker direction. Next, we show how performance can be improved by factoring the first layer to separate the multichannel spatial filtering operation from a single channel filterbank which computes a frequency decomposition. We also introduce an adaptive variant, which updates the spatial filter coefficients at each time frame based on the previous inputs. Finally we demonstrate that these approaches can be implemented more efficiently in the frequency domain.
Duration
0:52:30
Subtitles

How Classical Machine Learning Can Help Modern Wireless Communications

SHARE:
Category
Proficiency
Language
Media Type
Intended Audience
Pricing

SPS Members $0.00
IEEE Members $11.00
Non-members $15.00

Date
Data-driven approaches have swept all walks of science and engineering in recent years, with deep neural networks, deep reinforcement learning, and adversarial networks becoming the new staples that everyone uses to tackle a very wide variety of problems. While the empirical success of these methods is truly impressive when a lot of training data is available, there are still many problems that can in fact benefit from classical machine learning tools. In this talk, I will focus on showcasing the remarkable potential of latent factor analysis in the context of modern wireless communications. In particular, I will talk about edge-cell interferometry - a technique we recently devised that can reliably decode edge-cell users that are only 3dB above the noise floor, without requiring knowledge of their channels. I will also talk about how latent factor analysis can be used to tackle very hard estimation and optimization problems on the way to 5G and well beyond.
Duration
0:54:02
Subtitles

SPC-ML