Speech and Language Processing Technical Committee Newsletter

November 2012

Welcome to the Winter 2012 edition of the IEEE Speech and Language Processing Technical Committee's Newsletter! In this issue we are focusing on news from recent conferences (such as MLSLP and SANE) and workshops (such as Telluride and NIST TRECVID). This issue of the newsletter includes 9 articles from 9 guest contributors, and our own staff reporters and editors. Thank you all for your contributions!

We believe the newsletter is an ideal forum for updates, reports, announcements and editorials which don't fit well with traditional journals. We welcome your contributions, as well as calls for papers, job announcements, comments and suggestions. You can submit job postings here, and reach us at speechnewseds [at] listserv (dot) ieee [dot] org.

We'd like to recruit more reporters: if you are still a PhD student or graduated recently and interested in contributing to our newsletter, please email us (speechnewseds [at] listserv (dot) ieee [dot] org) with applications. The workload includes helping with the reviews of submissions and writing occasional reports for the Newsletter. Finally, to subscribe to the Newsletter, send an email with the command "subscribe speechnewsdist" in the message body to listserv [at] listserv (dot) ieee [dot] org.

Dilek Hakkani-Tür, Editor-in-chief
William Campbell, Editor
Patrick Nguyen, Editor
Martin Russell, Editor


From the SLTC and IEEE

From the IEEE SLTC chair

John Hansen


Speech and Language Processing for Educational Applications

Klaus Zechner

This article provides a brief overview of the history, current state-of-the-art, and anticipated future trends in the areas of speech and language technology for educational applications. It also provides some examples of seminal applications in the field.


MLSLP 2012 brings together speech, natural language processing, and machine learning researchers

Karen Livescu

The 2nd Symposium on Machine Learning in Speech and Language Processing (MLSLP) was recently held as a satellite workshop of Interspeech 2012 in Portland on September 14.


Increasing Popularity of Speech and Audio Event Recognition in Unconstrained Multimedia Data

Case Study: NIST TRECVID Multimedia Event Detection Evaluations

Murat Akbacak

Due to the popularity of online user-submitted videos, multimedia content analysis and event modeling for the purposes of event detection and retrieval is getting more and more attention from the speech and audio processing communities. As the amount of online multimedia data is increasing every day, and the users' search needs are changing from simple content search (e.g., find me today's Giants videos) to more sophisticated searches (e.g., find me this week's Giants home-run video snippets), speech and audio components are becoming more important as they convey complimentary and richer information to image/video content. In this article, we will talk about the increasing popularity of speech recognition and audio event recognition technologies for multimedia content analysis. We will also present the challenges as well as the ongoing research efforts in these two fields by using Multimedia Event Detection (MED) track of NIST TRECVID evaluations as our case study.


A Glimpse of IEEE SLT 2012

Ruhi Sarikaya and Yang Liu

The fourth biannual IEEE SLT workshop will be in Miami, Dec 2-5, 2012. Full preparation for the workshop is currently underway, and the conference program has been finalized. The accepted papers cover a wide range of topics in spoken language technology, ranging from speech recognition to various language understanding applications.

Pay Attention, Please:
Attention at the Telluride Neuromorphic Cognition Workshop

Malcolm Slaney

The role of attention is growing in importance as speech recognition moves into more challenging environments. This article briefly describes recent projects on attention at the Telluride Neuromorphic Cognition Workshop. These projects have studied different parts of attention in a short, focused, working workshop, using EEG signals to "listen" to a subject's brain and decode which of two speech signals s/he was attending.

The 10th Information Technology Society Conference on Speech Communication

David Suendermann

This article provides a short review of the 10th Information Technology Society Conference on Speech Communication held in Braunschweig, Germany, September 26 to 28. Organized by a primarily German scientific committee, the conference has grown very selective with a good number of major contributors in the field (e.g. the keynote speaker Steve Young and at least five more SLTC members involved as organizers, chairs, etc.). Sessions spanned a wide range of domains such as Spoken Language Processing, Speech Information Retrieval, Robust Speech Recognition, or Automotive Speech Applications, and were hosted at the Braunschweig University of Technology, well-known for Carl Friedrich Gauss, former professor at this university.


SANE Conference Overview

Tara N. Sainath

The Speech and Audio in the Northeast (SANE) Conference was held on October 24, 2012 at Mitsubishi Electric Research Laboratories (MERL) in Cambridge, MA. The goal of this meeting was to gather researchers and students in speech and audio from the northeast American continent.


Unfamiliar applications of some familiar techniques

Martin Russell, Chris Baber, Manish Parekh and Emilie Jean-Baptiste

This article considers applications in other domains, of techniques that are familiar in the context of speech and language processing. The focus is the EU CogWatch project, although there are many other examples. Do technique such as hidden Markov models (HMMs) lend themselves naturally to these new domains, or is it just an instance of Maslow's hammer, or perhaps (apologies to Maslow) a variant that should be referred to as Maslow's HMMer - "I suppose it is tempting, if the only tool you have is a HMM, to treat everything as if it were speech recognition"? This article briefly explores this issue and argues that this is not the case, that these methods are appropriate because of the nature of the problems, that these new applications can benefit from the experience and investment of the speech and language research community, and that, conversely, challenges in these new areas might give new insights into difficult speech and language processing problems.


Grounding and Levels of Understanding in Human-Computer Dialogue

Matthew Marge

Spoken dialogue system researchers have adopted theoretical models of human conversation; this article describes one theory of human-human communication and its adaptation to human-computer communication.



From the SLTC Chair

John H.L Hansen

SLTC Newsletter, November 2012

Welcome to the last installment of the SLTC Newsletter for 2012. The next IEEE ICASSP conference is approaching, with due dates for papers now set for Nov. 30th (see this link: http://www.icassp2013.com/). This promises to be a very engaging ICASSP meeting, especially for researchers, students, and engineers/scientists in the speech and language processing domain. The country of Canada has a rich diversity of indigenous/Inuit languages, with over 30 languages, especially in the pacific northwest of Canada, spoken by native peoples, while English and French are the first languages of 58% and 23% of Canadians respectfully. Canadian federal law requires all government services to be bilingual. Also, while all government proceedings are in English and French, all proceedings are also simultaneously translated into a number of the native Inuit languages as well. I hope you consider submitting a paper to ICASSP-2013 (Due Date is Nov. 30, 2012), and/or consider attending this conference which promises to be as technically enriching as past ICASSP meetings.

ICASSP-2013 5-Page Format

Another aspect which I would like to raise in this installment is the notion of citations and connecting ones research contributions to past work. The organizers of this upcoming IEEE ICASSP-2013 have proposed and are experimenting with a new 5 page format for the conference papers. Since the first ICASSP conference in 1976 in Philadelphia (40 years ago!), the paper format has always been a four page format. For those of you who are old enough to remember some of the earlier ICASSPs (a summary of which is included below), we submitted extended abstracts with initial results, and TCs met at IEEE Headquarters in New Jersey to make decisions after an initial review process.

If accepted, authors were sent large paper mattes which were used to lay out the text and figures of your paper (something like what is shown above). The notion of planning the space "real-estate" planning for your paper was a core task! Authors would be required to print out hard copies of their text, which were then pasted onto the paper mattes, and when completed mailed to IEEE where photo offsets were produced for the final hard copy proceedings.

As such, there has been a strong history and motivation to not expand from this 4 page option, primarily because of the known number of papers and available page count for printing the proceedings. However, with advancements in page layout and electronic media and dissemination, the motivation for maintaining the 4 page format is being questioned.

A sample of ICASSP Conference Logos over the years!

One overriding reason for this is a slow and steady decline in referencing past work. The IEEE Signal Processing Society has made a consorted effort to transition past journal papers into electronic format, however the quality of some of these are not at the level currently seen in today's digital media. For whatever reason, there is a growing trend in ICASSP papers for authors to sacrifice reference/citation space in order to include more technical content including figures/results. A probe study by IEEE ICASSP-2013 organizers showed a reduction in ICASSP paper citations over the past few years, with a total of 11020 citations in 2006 (resulting in 5.8 citations per paper), to 7664 citations in 2008 (resulting in 4.0 citations per paper). So, the new format allows for only references to appear on the fifth page, and in theory allow authors to emphasize and connect their research contributions more clearly with past research publications. On behalf of the SLTC, we feel that authors should maybe focus on better space management to include the references necessary to project their contribution within the context of the past research field. However, adding this fifth page does in fact encourage and remove any excuse for not including a fair and balanced set of references to the proposed research contributions contained in the manuscript, and the SLTC membership welcomes this opportunity to see if the resulting papers will be stronger because of this change!

SLTC 2013 New Members

Next, the SLTC recently completed the election process of new SLTC members. We had 55 nominations for 17 positions, and we wish to thank all those who were nominated and participated in the election process. I would therefore like to welcome the following newly elected members for 2013:

SLTC 2012 Retiring Members

The SLTC is a completely voluntary committee which makes advancements and oversees the review process of approximately 700 ICASSP papers each year. The extensive work done by SLTC is intended to reflect and represent the IEEE Signal Processing Society membership with interests in the speech and language processing disciplines. Without the tireless efforts of many of these volunteers, it would not be possible to effectively represent the wishes of the IEEE SPS members. As such, I wish to extend a sincere thanks to the following members who are completing their three-year term in the SLTC and will retire Dec. 31, 2012 (but will still be part of one last ICASSP review process!).