We live in an era where more and more tasks, once thought to be impregnable bastions of human intelligence, have succumb to AI. Are we at the cusp where ASR systems have matched expert humans in conversational speech recognition? We try to answer this question with some experimental evidence on the Switchboard English conversational telephony corpus. On the human side, we describe some listening experiments that established a new human performance benchmark. On the ASR side, we discuss a series of deep learning architectures and techniques for acoustic and language modeling that were instrumental in lowering the word error rate to record levels on this task.

DOI

https://dx.doi.org/10.17023/d235-0947

Duration

0:59:34

Subtitles

✖

Moving to Neural Machine Translation at Google

View on the SPS Resource Center

Category

Proficiency

Language

Media Type

EDICs

Intended Audience

Pricing

SPS Members $0.00
IEEE Members $11.00
Non-members $15.00

Keywords

Sequence-to-sequence models

Multilingual models

Brain neural machine translation (BNMT)

Machine translation system

Neural recurrent sequence models

Zero-shot translation

2017 IEEE Automatic Speech Recognition and Understanding Workshop

ASRU 2017

Authors

Mike Schuster

Date

13 January 2018

Machine learning and, in particular, neural networks, have made great advances in the last few years for products that are used by millions of people, most notably in speech recognition, image recognition, and recently in neural machine translation. Neural Machine Translation (NMT) is an end-to-end learning approach for automated translation, with the potential to overcome many of the weaknesses of conventional phrase-based translation systems. Unfortunately, NMT systems are known to be computationally expensive both in training and in translation inference. Also, most NMT systems have difficulty with rare words. These issues have hindered NMT's use in practical deployments and services, where both accuracy and speed are essential. In this work, we present GNMT, Google's Neural Machine Translation system, which addresses many of these issues. The model consists of a deep LSTM network with 8 encoder and 8 decoder layers using attention and residual connections. To accelerate final translation speed, we employ low-precision arithmetic during inference computations. To improve handling of rare words, we divide words into a limited set of common subword units for both input and output. On the WMT'14 English-to-French and English-to-German benchmarks, GNMT achieves competitive results to state-of-the-art. Using human side-by-side evaluations it reduces translation errors by more than 60% compared to Google's phrase-based production system. The new Google Translate was launched in late 2016 and has improved translation quality significantly for all Google users.

DOI

https://dx.doi.org/10.17023/507t-d381

Duration

1:10:54

Subtitles

✖

Subscribe to HLT-UNDE

Publications & Resources

Conferences & Events

Education & Training

Community & Involvement

Career & Industry

About IEEE SPS

For Volunteers

HLT-UNDE

Man vs. Machine in Conversational Speech Recognition

Moving to Neural Machine Translation at Google

IEEE Signal Processing Society on

Publications & Resources

Conferences & Events

Education & Training

Community & Involvement

About IEEE SPS

For Volunteers

Career & Industry

Education & Training