Skip to main content

SPC-ML

Multichannel Raw-Waveform Neural Network Acoustic Models

SHARE:
Pricing

SPS Members $0.00
IEEE Members $11.00
Non-members $15.00

Authors
Date
Farfield speech recognition has become a popular research area in the past few years, from more research focused activities such as the CHiME Challenges, to the launches of Amazon Echo and Google Home. This talk will describe the research efforts around Google Home. Most multichannel ASR systems commonly separate speech enhancement, including localization, beamforming, and postfiltering from acoustic modeling. In this talk, we will introduce a framework to do multichannel enhancement jointly with acoustic modeling using deep neural networks. Inspired by beamforming, which leverages differences in the fine time structure of the signal at different microphones to filter energy arriving from different directions, we explore modeling the raw time-domain waveform directly. We introduce a neural network architecture that performs multichannel filtering in the first layer of the network and shows that this network learns to be robust to varying target speaker direction of arrival, performing as well as a model that is given oracle knowledge of the true target speaker direction. Next, we show how performance can be improved by factoring the first layer to separate the multichannel spatial filtering operation from a single-channel filterbank that computes a frequency decomposition. We also introduce an adaptive variant, which updates the spatial filter coefficients at each time frame based on the previous inputs. Finally, we demonstrate that these approaches can be implemented more efficiently in the frequency domain.
Duration
0:52:30
Subtitles

How Classical Machine Learning Can Help Modern Wireless Communications

SHARE:
Category
Proficiency
Language
Media Type
Intended Audience
Pricing

SPS Members $0.00
IEEE Members $11.00
Non-members $15.00

Date
Data-driven approaches have swept all walks of science and engineering in recent years, with deep neural networks, deep reinforcement learning, and adversarial networks becoming the new staples that everyone uses to tackle a very wide variety of problems. While the empirical success of these methods is truly impressive when a lot of training data are available, there are still many problems that can, in fact, benefit from classical machine learning tools. In this talk, I will focus on showcasing the remarkable potential of latent factor analysis in the context of modern wireless communications. In particular, I will talk about edge-cell interferometry - a technique we recently devised that can reliably decode edge-cell users that are only 3dB above the noise floor, without requiring knowledge of their channels. I will also talk about how latent factor analysis can be used to tackle very hard estimation and optimization problems on the way to 5G and well beyond.
Duration
0:54:02
Subtitles