March 13, 2004


Welcome to the IEEE Signal Processing Society Speech Technical Committee (STC) newsletter.  As always, contributions of events, publications, workshops, and career information to the newsletter are welcome.  Please send to Rick Rose (   Archives of recent STC Newsletters can be found on the STC website

ICASSP 2005 Technical Program Preparation  (Kenneth Barner and Jean-Christophe Pesquet)

STC Awards  (Ananth Sankar)

Call for Papers for a Special Issue of the IEEE Transactions on SAP: Progress in Rich Transcription
Call for Papers for a Special Issue of the IEEE Transactions on SAP: Expressive Speech Synthesis

2005 Human Language Technology Conference on Empirical Methods in Natural Language Technology
2005 AAAI Workshop on Spoken Language Understanding
IEEE 2005 Automatic Speech Recognition and Understanding (ASRU) Workshop
National Institute of Standards 2005 Speaker Identification Evaluation

Postdoctoral Fellow Position in Speech Recognition and Speech Modeling Available at McGill University  (Rick Rose)
Senior R&D Positions Available in Speech Technology  (Vishu Viswanathan)
Transitions: ASR Researchers Take new Positions 

Links to conferences and workshops organized by date  (Rick Rose)

Summary of ICASSP 2005 Submissions from the

Conference Technical Program Chairs

The following summary of the overall ICASSP2005 technical program preparation was provided by the Technical Program Co-Chairs, Kenneth Barner and Jean-Christophe Pesquet.

The papers submitted to ICASSP were routed to the appropriate TCs for review. The TCs have worked very hard, with the help of external reviewers. To ensure that the papers were thoroughly and fairly reviewed, most submissions received three reviews this year.  The review process is a monumental task. The high degree of professionalism demonstrated by the TCs is a major factor contributing to the success of ICASSP. Much of the credit goes to the TC numbers and reviewers, who worked hard under the TC leadership of Mazin Rahim, Antonio Ortega, Alle-Jan van der Veen, Ananthram Swami, Petar Djuric, Alex Gershman, Max Wong, Michael Zoltowski, Tülay Adali, Michael Brandstein, Wayne Burleson, Yu Hen Hu, Eli Saber, and Huseyin Abut. Several TC chairs were also efficiently seconded by area coordinators who had the responsibility of a group of expert reviewers.

We, as Technical Program Chairs of the conference, worked closely with the TC chairs to put together the final technical program. Conference Management Services, and in particular, Lance Cotton and Billene Mercer, provided the excellent infrastructure and support that enabled that technical program to come together. We want to
express our special thanks to all these people, to all the contributing authors, and to the special session chairs who organize outstanding sessions on timely topics.

In this year's ICASSP Technical Program, we have organized the papers into 11 technical tracks, comprising 70 lecture and 80 poster sessions. Among the 1430 accepted papers, many, in fact most, will be poster presentations. The choice of oral or poster was made by the TCs based entirely on subject grouping.

The breakdown of submissions by technical committee is given in the following table:
Technical Committee Submissions Submissions
Speech Processing 571
Image & Multidimensional Signal Processing 520
Signal Processing for Communications 405
Signal Processing Theory and Methods 378
Sensor Array & Multi-channel Signal Processing 189
Machine Learning for Signal Processing 179
Audio & Electroacoustics 152
Design & Implementation of SP Systems 82
Multimedia Signal Processing 79
Industry Technology Track 60
Signal Processing Education 18
Special Sessions 104

back to top

2004 IEEE Signal Processing Awards

2004 was a very successful year for the Speech area. Four out of the seven SPS awards were won by Speech researchers.  In 2004, the Speech Technical Committee (STC) formed an Awards Subcommittee to coordinate the process of nominating candidates for the three Paper Awards and the four Major Awards. The committee members were Ramesh Gopinath, Li Deng, Alan Black, Kazunori Mano, Isabel Trancoso, and Ananth Sankar. The STC Awards Committee received a total of 23 nominations for the 7 categories from 12 individual nominators. This was followed by a vote within the STC to choose the final 7 nominations for the Speech area. The final nominations were then reviewed and revised by the Awards Subcommittee before submission to the Awards Board. Particular care was taken to highlight the contributions of the final nominees.

Speech won two of the three Paper Awards, and two of the four Major Awards. There were multiple winners in some categories. The Speech winners are listed below. The total number of winners in each category is in parenthesis.

 Technical Achievement Award (2):  Prof. Steve Young

Meritorious Service Award (1):  Prof. Andreas Spanias

Young Author Paper Award (2):  G. Tzanetakis and P. Cook, Musical genre classification of audio signals, IEEE Trans. on Speech and Audio Processing, vol. 10, pp. 293-302, July 2002. The young author was George Tzanetakis

Best paper award (4): E. Bocchieri and B. K.W. Mak, Subspace distribution clustering hidden Markov model, IEEE Trans. on Speech and Audio Processing,
vol. 9, pp. 264-275, March 2001.

The STC takes great pride in the achievement of our Speech winners, and extends our heartiest congratulations to them.
top of page

Special Issue of
The IEEE Transactions on Speech and Audio Processing
Progress in Rich Transcription

Over the past several years, Rich Transcription has emerged as an interdisciplinary field combining automatic speech recognition, speaker identification, and natural language processing with the goal of producing richly annotated speech transcriptions that are useful both to human readers and to automated programs for indexing, retrieval and analysis. The key problems include making more accurate speech transcription technology; improving speaker recognition technology; developing fundamentally new techniques for annotating dialog with semantic intent; and enriching ASR output to present it in a maximally informative manner. These various goals interact with each other, and exploiting synergistic uses of the disparate forms of analysis is critical. With its focus on fundamental research in human communication, Rich Transcription is key to governmental applications in data mining, and to commercial applications such as call center automation and monitoring.

The purpose of this special issue is to present recent advances in all areas of Rich Transcription for Speech, Audio, and Spoken Language Dialog. Original, previously unpublished submissions for the following areas are encouraged:

Submission procedure:

Prospective authors should prepare manuscripts according to the Information for Authors as published in any recent issue of the Transactions and as available on the web at Note that all rules will apply with regard to submission lengths, mandatory overlength page charges, and color charges.

Manuscripts should be submitted electronically through the online IEEE manuscript submission system at When selecting a manuscript type, authors must click on "Special Issue of T-SA on Progress in Rich Transcription." Authors should follow the instructions for the IEEE Transactions on Speech and Audio Processing and indicate in the Comments to the Editor-in-Chief that the manuscript is submitted for publication in the Special Issue on Progress in Rich Transcription. We require a completed copyright form to be signed and faxed to 1-732-562-8905 at the time of submission. Please indicate the manuscript number on the top of the page.

Submission deadline: 1 October 2005
Notification of acceptance: 1 April 2006
Final manuscript due: 31 May 2006
Tentative publication date: September 2006

Guest Editors:
Dr. Geoffrey Zweig IBM, Yorktown Heights, NY.
Dr. John Makhoul BBN Technologies, Cambridge MA.
Dr. Barbara Peskin ICSI, Berkeley, CA.
Dr. Phil Woodland Cambridge University, Cambridge, U.K.
Dr. Andreas Stolcke SRI International, Menlo Park, CA.
top of page

Special Issue of
The IEEE Transactions on Speech and Audio Processing
Expressive Speech Synthesis

Expressive Speech Synthesis (ESS) is a multidisciplinary research area that addresses one of the most complex problems in speech and language processing. The challenges posed by ESS have been the subject of several collaborative research projects across universities and laboratories around the world. Over the last decade ESS has benefited from advances in speech and language processing as well as from the availability of large conversational-speech databases. These advances have spurred research on the expressiveness of speech and on conveying paralinguistic information including emotion, speaker-state, and speaker-listener relationships. There have also been substantial efforts towards automating database creation and evaluating the quality of speech synthesised for a variety of tasks that require not just the transmission of information, but also the expression of affect.

The purpose of this special issue is to present recent advances in Expressive Speech Synthesis. Original, previously unpublished research is sought in all areas relevant to the field. In particular, submissions on theory and methods for the following areas are encouraged:

Submission procedure:

Prospective authors should prepare manuscripts according to the Information for Authors as published in any recent issue of the Transactions and as available on the web at Note that all rules will apply with regard to submission lengths, mandatory overlength page charges, and color charges.

Manuscripts should be submitted electronically through the online IEEE manuscript submission system at When selecting a manuscript type, authors must click on "Special Issue of T-SA on Expressive Speech Synthesis." Authors should follow the instructions for the IEEE Transactions on Speech and Audio Processing and indicate in the Comments to the Editor-in-Chief that the manuscript is submitted for publication in the Special Issue on Statistical and Perceptual Audio Processing. We require a completed copyright form to be signed and faxed to 1-732-562-8905 at the time of submission. Please indicate the manuscript number on the top of the page.

Submission deadline: 1 June 2005
Notification of acceptance: 1 December 2005
Final manuscript due: 28 February 2006
Tentative publication date: May 2006

Guest Editors:
Dr. Nick Campbell ATR Network Informatics Research Labs, Kyoto, Japan
Dr. Wael Hamza IBM T.J. Watson Research Center, Yorktown Heights, USA
Dr. Harald Hoge SIEMENS AG Central Technology, Germany
Dr. Tao Jianhua Pattern Recognition Laboratory, the Chinese Academy of Sciences
Dr. Gerard Bailly Institut de la Communication Parlee, France
top of page

HLT/EMNLP 2005 Call for Papers

Human Language Technology Conference
on Empirical Methods in Natural Language Processing

October 6-8, 2005
Vancouver, B.C., Canada

Submission deadline: June 3, 2005

HLT/EMNLP 2005 continues the conference series jointly sponsored by the Human Language Technology Advisory Board (HLT) and the Association for Computational Linguistics (ACL). This year's conference is co-sponsored by SIGDAT, the ACL's special interest group on linguistic data and corpus-based approaches to NLP, which has traditionally sponsored the Empirical Methods in Natural Language Processing (EMNLP) Conferences. The joint conference provides a unified forum for researchers across a spectrum of disciplines to present recent, high-quality, cutting-edge work, to exchange ideas, and to explore emerging new research directions. The conference especially encourages submissions that discuss synergistic combinations of language technologies (e.g., Speech with Information Retrieval, Machine Translation with Speech, Question Answering with Natural Language Processing, etc.). Particular consideration will be given to papers addressing novel learning tasks and evaluation metrics in speech, natural language processing and information retrieval, including e.g.:

We are interested in papers from academia, government, and industry on all areas of traditional interest to the HLT and SIGDAT communities, as well as aligned fields, including but not limited to:

Important Dates:

Submission deadline June 3, 2005
Notification of acceptance July 29, 2005
Submission of camera-ready papers August 12, 2005
Conference October 6-8, 2005


Submissions must describe original, completed, unpublished work, and include concrete evaluation results when appropriate. Papers being submitted to other meetings must provide this information (see submission format). In the event of multiple acceptances, authors are requested to immediately notify the HLT/EMNLP program chair and to choose which meeting to present and publish the work at as soon as possible. We cannot accept for publication or presentation work that will be (or has been) published elsewhere.

Papers must be submitted electronically in Postscript (PS) or Portable Document Format (PDF). They should follow the ACL formatting guidelines and should not exceed eight (8) pages in two-column format, including references and illustrations. Papers exceeding the maximum length may be rejected without review. Authors are encouraged to use the style files provided on the HLT/EMNLP 2005 website. We strongly prefer submissions in PS format. Any author who submits in PDF must assume the responsibility for ensuring that fonts are treated properly so that the paper will print (not just view) anywhere. (This may involve reading the manual.) DOC/RTF formats cannot be accepted.

Reviewing will be blind. No information identifying the authors should be in the paper: this includes not only the authors' names and affiliations, but also self-references that reveal authors' identities; for example, "We have previously shown (Smith 1999)" should be changed to "Smith (1999) has previously shown". Names and affiliations should be listed on a separate identification page.

Papers must be submitted electronically by 12 a.m. GMT on June 3, 2005, through the conference website. In addition, information about each paper must be provided, including:

Authors who cannot submit a file electronically should contact the program chairs before the due date to arrange alternative forms of submission.

After notifications of acceptance have been issued, authors will have the opportunity to revise their submissions in accordance with reviewers' comments. The due date for the final submission of camera-ready papers is August 12, 2005.
top of page

SLU 2005

AAAI Workshop on

Spoken Language Understanding

Held in conjunction with
The Twentieth National Conference on Artificial Intelligence - AAAI 2005

July 9 or 10, 2005, Pittsburgh, Pennsylvania


Call for Papers

Accepted Papers

Workshop Program

Instructions for Authors

Program Committee


Workshop Description

Natural language processing (NLP) has been one of the defining subtopics of AI since its early days. In recent times, NLP has predominantely been about text understanding and building associated resources for the purposes of information-extraction, question-answering and text mining. Many of these tasks have nourished the creation and development of extensive ontologies, practical semantic representations, and novel machine learning techniques. In a spirit similar to the workshop at HLT-NAACL 2004 on this topic, our attempt is to broaden the scope of language understanding to include understanding of spoken language (SLU) in the context of applications such as speech mining and human-machine interactive spoken dialog systems. We aim to bring together techniques that address the issue of robustness of SLU to speech recognition errors, language variability and dysfluencies in speech with issues of semantic representation that provide greater flexibility and portability to a dialog model. We believe spoken language understanding is an especially attractive topic for cross-fertilization of ideas between AI, IR, NLP, Speech and Semantic Web communities.


Workshop Topics

We invite submissions covering the full range of topics related to Spoken Language Understanding. Topics of interest include (but are not limited to):

  • Approaches to building an SLU
    • rule-based, data-driven, or hybrid
    • automatic adaptation across domains
  • Approaches to robustness in SLU
    • Handling uncertain and erroneous input
    • Handling dysfluencies and language variations
  • Tighter integration of Speech Recognition and SLU
    • Exploiting weighted packed representation of hypotheses
    • Exploiting prosodic and emotional cues from speech
  • Approaches to semantic representations provided by SLU
    • Combining shallow and deep representations
    • Representations permitting robust inference mechanisms
  • Tools and Data Resources
  • Issues and metrics for evaluation of SLU
  • SLU in the context of Applications
    • multilingual systems
    • multimodal systems
    • tutoring systems
    • speech mining systems
    • spoken dialog systems


Paper Submission

All submissions must be sent to, with the subject line "AAAI-05 SLU Workshop paper submission". Please use the AAAI prescribed formatting instructions available at Papers must be 5 to 8 pages long, including all references and figures. All papers must be submitted in either PDF (preferred) or postscript format. If any special fonts are used, they must be included in the submission. The papers must be original, and have not been published. Note that reviewing will NOT be blind, the paper submissions may include the authors' names and affiliations.


Important Dates

  •  April 20, 2005: Deadline for electronic submission
  • May 11, 2005: Notification of acceptance or rejection
  • May 18, 2005: Submission of camera-ready papers


Workshop Co-Chairs



Previous Workshops





top of page



Automatic Speech Recognition and Understanding Workshop

Tulum ruins

Fiesta Americana Grand Coral Beach Resort 

Cancun, Mexico

November 27 – December 1, 2005

The Ninth biannual IEEE workshop on Automatic Speech Recognition and  Understanding (ASRU) will be held November 27-December 1, 2005.   The ASRU Workshops have a tradition of bringing together researchers from academia and industry in an intimate and collegial setting to discuss problems of common interest in automatic speech recognition and understanding. Papers in all areas of human language technology are encouraged to be submitted, with emphasis placed on automatic speech recognition and understanding technology, speech to text systems, spoken dialog systems, multilingual language processing, robustness in ASR, spoken document retrieval, and speech-to-speech translation.
The workshop program will consist of invited lectures, oral and poster presentations, and panel discussions. Ample time will be allowed for informal discussions and to enjoy the impressive tropical setting.  The workshop website will be accessible by January, 2005.

Prospective authors are invited to submit full-length, 4-6 page papers, including figures and references, to . All papers will be handled and reviewed electronically. The ASRU 2005 website will provide you with further details. Please note that the submission dates for papers are strict deadlines.

Special sessions proposals should be submitted by June 15, 2005, to and must include a topical title, rationale, session outline, contact information, and a description of how the session will be organized.


May 1, 2005            Workshop registration opens
July 1, 2005              Camera-ready paper submission deadline
August 15, 2005        Paper Acceptance / Rejection notices mailed
Sept. 15, 2005          Revised Papers Due and Author Registration Deadline
Oct. 1, 2005             Hotel Reservation and Workshop Registration
Nov. 27 – Dec.1, 2005    Workshop

General Chairs
    Jim Glass, MIT, USA
    Richard Rose, McGill University, Canada
Technical Chairs
    Michael Picheny, IBM, USA
    Renato de Mori, Avignon, France
    Richard Stern, CMU, USA
Publicity Chair
    Ruhi Sarikaya, IBM, USA
Publications Chair
    Dilek Hakkani-Tur, AT&T, USA
Local Arrangements Chair:
    Juan Nolazco, Monterrey, Mexico
Demonstrations Chair
    Anand Venkataraman, SRI, USA

Alex Acero,                                          Sadaoki Furui, Univ. of Tokyo
Srini Bangalore,                                    J.L. Gauvain, LIMSI
Jerome Bellegarda, Apple                     Yuqing Gao, IBM
Mary Harper,                                         Hermann Ney, RTWH Aachen
Julia Hirshberg, Columbia University    Joe Picone, Mississippi State University
Helen Meng, CUHK                             Abeer Alwan, UCLA
Roberto Pieraccini, IBM                        Jeff Bilmes, Univ. of Washington
Alex Rudnicky, CMU                           Herve Boulard, IDIAP
Stephanie Seneff, MIT                           Dan Ellis, Columbia University
Liz Shriberg, SRI                                   Mark Hasegawa-Johnson, UIUC
Gokhan Tur, AT&T                               Hynek Hermansky, IDIAP
Wayne Ward, Univ. of Colorado            Chris Wellekens, EURECOM
Steve Young, Cambridge University      Chin-Hui Lee, Georgia Tech
Eric Fosler-Lussier, Ohio State University   Shri Narayanan, USC

back to top

The 2005 National Institute of Standards

Speaker Recognition Evaluation

NIST has been coordinating Speaker Recognition Evaluations since 1996. Each evaluation begins with the announcement of the official evaluation plan which clearly states the rules and tasks involved with the evaluation. The evaluation culminates with a follow-up workshop, where NIST reports the official results and researchers share in their findings.

Brief History

Since 1996, over 40 research sites have participated in our evaluations. Each year, new researchers in industry and universities are encouraged to participate. Collaboration between universities and industries, is also welcomed. The overall goals of the evaluations have always been to drive the technology forward, to measure the state-of-the-art, and to find the most promising algorithmic approaches.

The 2005 NIST Speaker Recognition Evaluation

The 2005 NIST Speaker Recognition evaluation is part of an ongoing series of yearly evaluations conducted by NIST. These evaluations provide an important contribution to the direction of research efforts and the calibration of technical capabilities. They are intended to be of interest to all researchers working on the general problem of text independent speaker recognition. To this end the evaluation was designed to be simple, to focus on core technology issues, to be fully supported, and to be accessible.

Non-LDC members are required to sign the LDC's license agreement before being granted access to SRE-05 data.

Evaluation Schedule
Posting of the official evaluation specification document
Last day to register for participation
Evaluation begins
Site submissions due at NIST
First release of results to the participants
Site workshop presentations/talks due at NIST
June 6-8, 2005 Evaluation workshop, Eastern United States

More Information

To find out more about previous evaluations and access related publications,  go to the  NIST  speaker idenitfication web page.  To register your desire to participate in future evaluations, to obtain more information about our evaluations, or to be notified of the next evaluation or developments there of, please e-mail Dr. Alvin Martin at NIST.

back to top

Two Senior R&D Positions Available 

Speech Technologies R&D Lab 

Texas Instruments, Dallas, TX

Handset Acoustic Signal Processing: The available position involves work as part of an R&D team designing, testing, and tuning acoustic solutions for wireless handsets in support of TI's wireless business. Responsibilities also include working collaboratively with our wireless product group and providing consulting support.
Qualifications:  Ph.D. in EE or MSEE with equivalent experience.  Strong background and experience in digital signal processing, with emphasis on speech processing and its applications. Demonstrated experience over 3-5 years in design, testing, and tuning of acoustic-related issues including acoustic echo cancellation, noise suppression, AGC, and compressor/limiter, especially as they relate to wireless handsets. Knowledge and experience in multi-microphone based speech acquisition is desirable.  Demonstrated software development experience in C/Unix and Matlab. Effective oral and written communication skills. Background  and experience in speech coding algorithms is a plus as the position is in TI's speech coding R&D team. 
Speech Recognition:  The available position requires a candidate with strong prior experience in automatic speech recognition research and development.  In R&D support of TI product groups, the Speech Technologies R&D laboratory designs and develops speaker-independent, speaker-dependent, and speaker-adaptive speech recognition systems for hand-held and hands-free voice input, focusing on recognition accuracy and robustness under adverse conditions as in mobile environments, small foot-print solutions, dynamic vocabulary and grammar, and multiple language recognition.

Qualifications:  The candidate must have a PhD or MS with equivalent experience in EE or CS, a strong background and experience in speech recognition technology, 3-5 years of experience in design and implementation of robust, high-performance speech recognition algorithmsand associated application development including API's, an interest andcommitment to solve real-world problems and bring algorithms to products, and demonstrated programming abilities.  Prior experience in letter-to-phoneme mapping, as would be needed for robust speaker-independent name recognition, and in dealing with multiple languages will be a plus.
Qualified candidates may send a letter and a resume by e-mail to Vishu Viswanathan (
back to top

Postdoctoral Fellowship Position in Speech Recognition

and Speech Modeling at McGill University

We are seeking a postdoctoral fellow to perform basic research as part of a three year project in the general area automatic speech recognition.  The position will be in the Electrical and Computer Engineering Department at McGill University in Montreal, Canada.   The candidate should have hands-on experience with  speech processing systems and should have a strong background in statistical modeling, signal processing, and/or speech analysis.   The candidate should also have facility with the use of high level programming languages for developing protoype systems and simulations.  Fluency in English is required and the ability to work in a small team is also important.

The position is in support of a Canadian funded NSERC project that is being conducted in association with the European 6th Frame Project DIVINES.  The overall goal of the project is to overcome deficiencies in existing acoustic feature analysis and phonetic and lexical modeling techiques.  This will be accomplished through a methodology  involving the diagnosis and modeling of instrinsic variabilities in ASR under a variety of conditions.   

McGill University is located in Montreal, an exciting cosmopolitan city in the Province of Quebec. Montreal is the home of a number of speech recognition Research and Development Laboratories including Centre Recherche  Informatique Montreal (CRIM), ScanSoft Canada, Nuance Canada, Nu Echo, and others.  Montreal is composed of a bilingual population with a blend of European and North American culture. Qualified applicants are invited to submit a resume together with the names and addresses of two references by email to:

McGill University
Richard Rose   
McGill University
Department of Electrical and Computer Engineering
McConnell Engineering Building, Room 813
3480 University Street
Montreal, Quebec
H3A 2A7

phone: 514-398-1749
fax: 514-398-4470 tment of Electrical & Computer Engineering

back to top

ASR Researchers Take New Positions

The STC Newsletter would like to provide announcements of  professors, researchers, and developers in the speech area taking new positions.  If you have moved lately or are in the process of moving  to a new position in the new future, send your new contact information to the STC Newsletter so it can be posted in the next edition.  

back to top

Links to Upcoming Conferences and Workshops

(Organized by Date)

Philadelphia, Pennsylvania, May, 2005

Auditory-Visual Speech Processing (AVSP 2005)
Vancouver Island, British Columbia, Canada, July24-27

SIGdial Workshop on Discourse and Dialog
Lisbon, Portugal , September 2-3, 2005

Sesimbra, Portugal, September 3, 2005

EUROSPEECH 2005 9th European Conference on Speech Communication and Technology
Lisbon, Portugal, September 4-8, 2005

Disfluency in Spontaneous Speech
Aix-en-Provence, September 10-12, 2005

IEEE WASPAA2005 Workshop on Applications of Signal Processing to Audio and Acoustics
New Paltz, New York, October 16-19, 2005

SPECOM 2005 - 10th International Conf. on Speech and Computers
Patras, Greece, October 17-19, 2005

IEEE ASRU2005 Automatic Speech Recognition and Understanding Workshop
Cancun, Mexico, November 27 - December 1, 2005

Toulouse, France May 15-19, 2006

Pittsburgh, PA, USA September 17-21, 2006

Honolulu, Hawaii, USA, 2007, April 17-20

Antwerp, Belgium, August 27-31, 2007

back to top