Speech and Language Processing Technical Committee Newsletter

February 2014

Welcome to the Spring 2014 edition of the IEEE Speech and Language Processing Technical Committee's Newsletter! This issue of the newsletter includes 9 articles and announcements from 12 contributors, including our own staff reporters and editors. Thank you all for your contributions! This issue includes news about IEEE journals and recent workshops, SLTC call for nominations, and individual contributions.

We believe the newsletter is an ideal forum for updates, reports, announcements and editorials which don't fit well with traditional journals. We welcome your contributions, as well as calls for papers, job announcements, comments and suggestions.

To subscribe to the Newsletter, send an email with the command "subscribe speechnewsdist" in the message body to listserv [at] listserv (dot) ieee [dot] org.

Florian Metze, Editor-in-chief
William Campbell, Editor
Haizhou Li, Editor
Patrick Nguyen, Editor

From the SLTC and IEEE

From the IEEE SLTC Chair

Douglas O'Shaughnessy

Call for Proposals - ASRU 2015

Geoffrey Zweig

Following on the tremendous success of ASRU 2013, the SPS-SLTC invites proposals to host the Automatic Speech Recognition and Understanding Workshop in 2015. Past ASRU workshops have fostered a collegiate atmosphere through a thoughtful selection of venues, thus offering a unique opportunity for researchers to interact.

Speech Synthesis Perfects Everyone's Singing

Minghui Dong, Nancy Chen, and Haizhou Li

Singing is more expressive than speaking. While singing is popular, singing well is nontrivial. This is especially true for songs that require high vocal skills. A singer needs to overcome two challenges among others - to sing in the right tune and at the correct rhythm. Even professional singers need intensive practice to perfect their vocal skills and to proficiently present particular singing styles, such as vibrato and resonance tuning. Recently, the Institute for Infocomm Research (I2R) in Singapore has developed a technology called Speech2Singing, which converts the singing voice of non-professional singers (or even spoken utterances) into perfect singing.

1 Billion Word Language Modeling Benchmark

Ciprian Chelba

Ciprian Chelba and colleagues released a language model benchmark and would like to advertise it to the speech community. The purpose of the project is to make available a standard training and test setup for language modeling experiments.

An Overview of ASRU 2013

Tara N. Sainath and Jan (Honza) Cernocky

The Automatic Speech Recognition and Understanding Workshop (ASRU) was recently hosted in Olomouc, Czech Republic from December 8-12, 2013. Each day of the workshop focused on a specific current theme that has become popular amongst ASR researchers. In this report, we highlight the 4 days of the workshop, and touch on some papers in more detail.

Speech and Audio Highlights from MediaEval 2013

Gareth J. F. Jones and Martha Larson

MediaEval is a benchmarking initiative dedicated to evaluating new algorithms for multimedia access and retrieval. While it emphasizes the 'multi' in multimedia and focuses on human and social aspects of multimedia task, speech and audio processing is a key component of several MediaEval tasks each year. MediaEval 2013 featured a total of 12 tasks exploring aspects of multimedia indexing, search and interaction, five of which involved significant elements of speech and audio processing, which we will discuss in this article.

Asia Information Retrieval Societies conference 2013

Rafael E. Banchs, Min Zhang, and Ming Hui Dong

AIRS 2013, the ninth edition of the Asia Information Retrieval Societies conference, took place from 9th to 11th December 2014 in Singapore. The conference was attended by a total of 85 participants from more than 20 countries. The technical program comprised 45 papers, from which 27 were selected for oral presentations and 18 for poster presentations.

From the SLTC Chair

Douglas O'Shaughnessy

SLTC Newsletter, February 2014

Welcome to the first SLTC Newsletter of 2014, under the new management of Florian Metze. We heartily thank Dilek Hakkani-Tur for her excellent editorship and her team (William Campbell, Patrick Nguyen, Haizhou Li) for their great work. We also look forward to continued excellence this year with the new team of William Campbell, Patrick Nguyen, and Haizhou Li.

Last September, we elected 17 members to replace those whose terms expired in December, as well as a new Vice Chair for the committee, Bhuvana Ramabhadran; she will take over the reins as Chair in 2015. We also say a fond thank you to all departing SLTC members: John Hansen, Masami Akamine, Abeer Alwan, Antonio Bonafonte, Honza Cernocky, Eric Fosler-Lussier, Pascale Fung, Dilek Hakkani-Tur, Qi Li, Hermann Ney, and Frank Soong. It will be difficult to replace all these excellent past members, but the new committee looks forward to the challenges. It is also not too soon to start thinking of this autumn’s election to renew the membership of the SLTC; so please recommend to colleagues to submit a nomination this summer. Details will be forthcoming in the next SLTC newsletter.

Our technical committee this year will have the following subcommittees (and members): Language Processing (Geoffrey Zweig, Sanjeev P. Khudanpur, Haizhou Li), Electronic Newsletter (William M. Campbell, Patrick Nguyen, Haizhou Li, Florian Metze, Svetlana Stoyanchev), Fellows (Rainer Martin, Malcolm Slaney, John Hershey), Workshops (Nick Campbell, George Saon, Israel Cohen), EDICS (Alexandros Potamianos, Yifan Gong, Shinji Watanabe), Policies and Procedures (Tiago Falk, Frank Seide), Education (Takayui Arai, Christine Shadle, Kay Berkling, Tom Bäckström), Nominations and Awards (Peder Olsen, Julia Hirschberg, Satoshi Nakamura, Pedro A. Torres-Carrasquillo), Communications (Korin Richmond, Heiga Zen, Tomoki Toda), Industry (Ananth Sankar, Dong Yu, Hagen Soltau), External Relations (Junichi Yamagishi, Gernot Kubin, Maurizio Omologo), Student Awards (Deep Sen, Mike Seltzer, Mark Hasegawa-Johnson, Svetlana Stoyanchev), Member Election (Larry Heck, Fabrice Lefevre, Bhiksha Raj, Andreas Stolcke), Meeting (Najim Dehak, Nicholas Evans, Panayiotis Georgiou), ICASSP 2014 Area Chairs (Björn Schuller, Bowen Zhou, Tim Fingscheidt, Michiel Bacchiani, Haizhou Li, Karen Livescu), HLT-ACL board liaison (Julia Hirschberg). If you have any needs in these areas, please contact one of these subcommittee members.

At this time, we look forward eagerly to ICASSP in Florence May 4-9. Our speech and language areas received 694 paper submissions, of which 50% were accepted. We thank all 497 reviewers for their 2800 reviews. The overall level of quality of the submissions was very good, which made deciding which papers to take, and which not, very hard at times. Among the tutorials at ICASSP is “Deep learning for natural language processing and related applications” (Xiaodong He, Jianfeng Gao, Li Deng of Microsoft Research). As these speakers note, deep learning techniques have enjoyed tremendous success in the speech and language processing community in recent years. The tutorial focuses on deep learning approaches to problems in language or text processing, with particular emphasis on important applications including spoken language understanding (SLU), machine translation (MT), and semantic information retrieval (IR) from text.

To give just a sampling of what we will see at ICASSP this May, four sessions will examine deep neural networks (DNNs) as acoustic models for automatic speech recognition (ASR), e.g., bootstrapping DNN training without Gaussian Mixture Models, generating a stacked bottleneck feature representation for low-resource ASR, replacing optimization by stochastic gradient descent with second-order stochastic optimization, etc. While DNNs can extract high-level features from speech for ASR tasks, there are many possible forms of DNN features, and some papers will explore how effective different DNN features are, including vectors extracted from both output and hidden layers in the DNN. Context-dependent acoustic modelling for DNNs suffers from data sparsity; papers at ICASSP will address this by using decision tree state clusters as training targets.

Other ASR papers at ICASSP will deal with convolutional neural networks, Deep Scattering Spectrum, and Stacked Bottle-Neck neural networks, as well as time-frequency masking to improve noise-robust ASR (in which regions of the spectrogram dominated by noise are attenuated). Recent years have seen an increasing emphasis on fast development of ASR using limited resources, to reduce the need for in-domain data. Discriminative models, such as support vector machines (SVMs), have been successfully applied to ASR, and will be examined further at ICASSP.

Other upcoming ICASSP’s are scheduled for Brisbane (2015), Shanghai (2016), New Orleans (2017), and Seoul (2018). We are also looking forward to the next IEEE Spoken Language Technology (SLT) Workshop, to be held at Harvey’s Lake Tahoe Hotel in Lake Tahoe, Nevada (Dec. 7-10, 2014).

Anyone wishing to help organize the 2015 ASRU (IEEE Automatic Speech Recognition and Understanding) workshop is advised to contact the Workshops Subcommittee; bids are due by April 25th. The biannual meeting will follow up the recent successful ASRU held in December 2013 in the Czech Republic.

In closing, I hope you will consider participating this year at ICASSP and SLT. We look forward to meeting friends and colleagues in beautiful Florence and Tahoe.

Best wishes,

Douglas O'Shaughnessy

Douglas O'Shaughnessy is the Chair of the Speech and Language Processing Technical Committee.

Call for Proposals - ASRU 2015

SPS-SLTC Workshop Sub-Committee: Nick Campbell, George Saon, Geoffrey Zweig

SLTC Newsletter, February 2014

Following on the tremendous success of ASRU 2013, the SPS-SLTC invites proposals to host the Automatic Speech Recognition and Understanding Workshop in 2015. Past ASRU workshops have fostered a collegiate atmosphere through a thoughtful selection of venues, thus offering a unique opportunity for researchers to interact.

The proposal should include the information outlined below.