Conference Review: ICASSP 2017

June, 2017

Introduction

The 42nd International Conference on Acoustics, Speech, and Signal Processing (ICASSP) was recently hosted in New Orleans, Louisiana from March 5-9th, 2017. The theme of the conference was, “the internet of signals.” In this article, we highlight the theme of the conference, emerging, and enduring trends at ICASSP.

Conference Location

ICASSP presents what is new and upcoming in speech and signal processing research, observable both through the keynote theme and the ever-growing diversity of the parallel sessions. This ICASSP was of particular note both for the aforementioned diversity and for the sheer enjoyment of being embedded in a vibrant city with one’s research colleagues. ICASSP was not merely “in” New Orleans, it was located deep in the heart of the French Quarter. The location created many opportunities for research conversations in both formal (the conference venue itself) and informal venues (over fresh beignets or at any of the numerous jazz venues). This combination is the bread and butter of innovation, sparking creativity by removing barriers (both conversational and spatial). I look forward to attending ICASSP next year to see the new directions that will result from these interactions.

Conference Theme: The Internet of Signals

Two of the four plenary talks at ICASSP this year focused on the Internet of Things (IoT). Dr. K. J. Ray Liu discussed the technology behind smart radios and demonstrated the plethora of applications that could effectively leverage these advances, including: home/office monitoring/security, radio human biometrics, vital signs detection, wireless charging, and 5G communications. Dr. Jan Rabay presented work on the growing and expanding field of IoT and discussed how the miniaturization of sensor technologies will enable the evolution of pervasive human-body oriented sensor networks, described as the Human Intranet. He discussed critical aspects of robustness, safety, security and privacy.

Emerging

The fields of Affective Computing and Paralinguistic modeling has had an increasing presence over the last several years at ICASSP. This year, there were three sessions devoted to these topics, two lectures and one poster. These topics also appeared in lecture sessions focused on biomedical signal processing and deep learning, in addition to poster sessions on spoken language understanding, applications of machine learning in signal processing, and supervised and semi-supervised learning. The strength of this trend was clarified further through the opening day plenary, delivered by Dr. Rana El-Kaliouby from Affectiva. Her talk highlighted the commercial applicability of these technologies and highlighted existing commercial-academic collaborations.

There was also a lecture session devoted to speech processing for medical diagnostics. The talks in this session covered assistive technology applications ranging from feature design to machine learning methods. Critically, these talks also focused on methods to enhance robustness, necessary for real world deployment.

The inclusion of these research themes is thrilling. Success in these areas means two things: (1) the engineering community will have methods that can handle the inherent variability in human communication and/or disease expression and (2) the community, including and extending beyond engineering, will have new knowledge about human behavior. ICASSP’s continued willingness to highlight these emerging research areas will result in fundamental advances in how we think about human-centered technologies and their contributions beyond the so-called ivory tower.

Enduring

The field of Automatic Speech Recognition (ASR) once again had a strong showing at ICASSP. Topics included advances in deep learning, end-to-end modeling, robustness to noise, spoken term detection, and speech enhancement. Progress in ASR was highlighted by the plenary on the third day, delivered by Dr. David Nahamoo. In addition to describing the history and development of ASR methods, he also highlighted IBM’s recent milestone, obtaining 5.5% word error rate (WER) on a benchmark speech corpus. Once again, there were strong submissions in source separation, speaker localization, speech synthesis, and speaker verification.

Publications & Resources

Conferences & Events

Education & Training

Community & Involvement

Career & Industry

About IEEE SPS

For Volunteers

Conference Review: ICASSP 2017

IEEE Signal Processing Society on

Publications & Resources

Conferences & Events

Education & Training

Community & Involvement

About IEEE SPS

For Volunteers

Career & Industry

Education & Training