Skip to main content

Man vs. Machine in Conversational Speech Recognition

SHARE:
Pricing

SPS Members $0.00
IEEE Members $11.00
Non-members $15.00

Authors
Date
We live in an era where more and more tasks, once thought to be impregnable bastions of human intelligence, succumb to AI. Are we at the cusp where ASR systems have matched expert humans in conversational speech recognition? We try to answer this question with some experimental evidence on the Switchboard English conversational telephony corpus. On the human side, we describe some listening experiments which established a new human performance benchmark. On the ASR side, we discuss a series of deep learning architectures and techniques for acoustic and language modeling that were instrumental in lowering the word error rate to record levels on this task.
Duration
0:59:34
Subtitles

Representation, Extraction, and Visualization of Speech Information

SHARE:
Pricing

SPS Members $0.00
IEEE Members $11.00
Non-members $15.00

Authors
Date
The speech signal is complex and contains a tremendous quantity of diverse information. The first step of extracting this information is to define an efficient representation that can model as much information as possible and will facilitate the extraction process. The I-vector representation is a statistical data-driven approach for feature extraction, which provides an elegant framework for speech classification and identification in general. This representation became the state of the art in several speech processing tasks and has been recently integrated with deep learning methods. This talk will focus on presenting variety of applications of the I-vector representation for speech and audio tasks including speaker profiling, speaker diarization and speaker health analysis. We will also show the possibility of using this representation to model and visualize information present in deep neural network hidden layers.
Duration
1:02:30
Subtitles

HLT-LANG