April 19, 2004


STC ICASSP 2004 Paper Review Process   (Michael Picheny and Rick Rose)
ICASSP 2004 Technical Program Preparation  (Peter Kabal and Li Deng)

Fourth International Symposium on Chinese Spoken Language Processing  (Pascal Fung)
2004 HLT/NAACL Workshop on Spoken Language Understanding for Conversational Systems
NIST Rich Transcription 2004 Meeting Recognition Workshop (John Garofolo)
COST278 Workshop on Robustness Issues in Conversational Interaction (Borge Lindberg)

Ron Schafer Retires After Four Decades of Contributions in DSP  (Larry Rabiner and Rick Rose)
Post Doctoral and PhD Positions at Eurecom in France  (Chris Wellekens)

Links to conferences and workshops organized by date  (Rick Rose)

New STC Paper Review Process
for ICASSP 2004

This year, the Speech Technical Committee implemented the new ICASSP paper review process for papers submitted for ICASSP2004.  The STC received 524 paper submissions in the speech area for ICASSP 2004, essentially the same as last year.  The review process was carried out by the 46 STC members along with a set of over 100 volunteer associate reviewers.   The associate reviewers were recruited by STC members as experts in the various technology areas covered by the STC.   The 46 STC members were broken into teams of two people each to cover each technology area.  A set of 4-5 associate reviewers (depending on the number of papers per team), were also assigned to technology areas.  Each paper was reviewed by a set of three reviewers, as opposed to two reviewers in previous years.  Each team performed their reviews and produced a ranked list of papers with a single overall score per paper.

The process followed by each team can be briefly defined as follows.  Each of the two STC team members read all the papers assigned to the team, but each member formally reviewed only half of the papers.   In parallel, the team divided the papers amongst the associate reviewers to ensure each paper receives a total of three reviews.   After all the reviews for a paper were collected, it was ranked and scored by the two STC members.   The set of ranked and scored papers was combined into a single list.   This year there was an effort to include three separate sets of comments from each reviewer and return them to the authors.   The goal here was to provide constructive feedback, especially for those authors whose papers were rejected.  In previous years, the comments from the two STC reviewers were returned to the authors.   The breakdown of submissions is given in the following table:

Speech Production/Synthesis
Speech Analysis / Feature Extraction
Speech Coding
Speech Enhancement
Acoustic Modeling for ASR
Robust ASR
Spoken Language Systems
Speaker Rec / Language ID
Summary of ICASSP 2004 Submissions from the

Conference Technical Program Chairs

The following summary of the overall ICASSP2004 technical program preparation was provided by the Technical Program Co-Chairs, Li Deng and Peter Kabal.  The submitted papers were routed to the appropriate technical committee for review.  The TCs have worked very hard, with the help of external reviewers, to ensure that each paper is thoroughly and fairly reviewed.  The review process is a monumental task.  The high degree of professionalism of the TCs is one major factor for the success of our ICASSP. Much credit of our technical program goes to the hard work of all TC members and reviewers (including associate reviewers in some TCs), under the leadership of the following TC Chairs: Michael Picheny, Mats Viberg, Michael Brandstein, Thrasyvoulos Pappas, John Sorensen, Alle-Jan van der Veen, Alex Gershman, Magdy Bayoumi, Tulay Adali, Thad Welch, and Jennifer Trelewicz. We, as Technical Chairs of the conference, worked closely with the TC chairs, with much of the work overlapping the holiday season. Our work has also been made much easier by closely working with Conference Management Services, Lance Cotton and Billene Mercer in particular. We want to express our special thanks to all of the people above, to all the contributing authors, and to the special session chairs who organized those sessions.  In this year's ICASSP Technical Program, we have organized all the accepted papers into 11 technical tracks, comprising 55 lecture and 88 poster sessions.  Among the 1324 accepted regular papers, many, in fact most, will be poster presentations. The choice of oral or poster was made by the TCs based entirely on subject grouping.  

The breakdown of submissions by technical committee is given in the following table:

Technical Committee
Speech Processing
Signal Processing Theory and Methods
Signal Processing for Communications
Image & Multidimensional Signal Processing
Sensor Array & Multi-channel Signal Processing
Audio & Electroacoustics
Industry Technology Track
Design & Implementation of SP Systems
Multimedia Signal Processing
Machine Learning for Signal Processing
Signal Processing Education
Special Sessions

4th International Symposium on

Chinese Spoken Language Processing

December 15-18, 2004

Hong Kong

 Preliminary Call for Papers
The 4th International Symposium on Chinese Spoken Language Processing (ISCSLP'04) will be held during December 16-18, 2004 in Hong Kong. ISCSLP is a conference for scientists, researchers, and practitioners to report and discuss the latest progress in all the scientific and technological aspects of the Chinese spoken language processing. The series of conferences have been held biennially in different Asia Pacific cities: 1998 in Singapore, 2000 in Beijing, and 2002 in Taipei. ISCSLP has become the world's largest and most comprehensive technical conference focused on Chinese spoken language processing and its applications. The ISCSLP'04 will feature world-class plenary speakers, tutorials, and a number of lecture and poster sessions on the following topics:
    * Speech Production and Perception
    * Phonetics and Phonology
    * Speech Analysis
    * Speech Coding
    * Speech Enhancement
    * Speech Recognition
    * Speech Synthesis
    * Language Modeling and Spoken Language Understanding
    * Spoken Dialog Systems
    * Spoken Language Translation
    * Speaker and Language Recognition
    * Indexing, Retrieval and Authoring of Speech Signals
    * Multi-Model Interface including Spoken Language Processing
    * Spoken Language Resources and Technology Evaluation
    * Applications of Spoken Language Processing Technology
    * Others
Hong Kong, better known as the Pearl of the Orient, is a place where East meets West. Shopping, dining, sightseeing, as well as world-class events and attractions are all conveniently available within a short distance. As the "City of Life" in Asia with multi-culture heritage and kaleidoscopic living style, Hong Kong buzzes with unique tourist attractions that are beyond compare in the region. You are cordially invited to attend ISCSLP'04 and to experience the fascination of Hong Kong that is unmatched anywhere in the world.
The working language of ISCSLP is English. Prospective authors are invited to submit full-length, four-page papers for presentation in any of the areas listed above. All ISCSLP'04 papers will be handled and reviewed electronically and details can be found in the conference web-site Please note that following important dates and plan your schedule well in advance.
Schedule of Important Dates:
Four page full paper submission to be received by       July 23, 2004
Notification of acceptance mailed out by       September 20, 2004
Camera ready papers to be received by          October 8, 2004
Early registration  November 12,2004
Call for participation

HLT-NAACL 2004 Workshop on

Spoken Language Understanding for Conversational Systems and

Higher Level Linguistic Information for Speech Processing

Friday, May 7, 2004
Park Plaza Hotel, Boston, USA
The success of a conversational system depends on a synergistic integration of technologies such as speech recognition, spoken language understanding (SLU), dialog modeling, natural language generation, speech synthesis and user interface design. In this workshop, we address the issue of improving the robustness of the speech recognition and SLU components by exploiting higher level linguistic knowledge, meta-information and machine learning techniques.
The first part of the workshop will focus on robust SLU in conversational systems, which has received much attention during the DARPA funded ATIS program of the 1990s and more recently the DARPA Communicator program. In parallel to that research, a number of real-world conversational systems have been deployed to date. However, the techniques for robust SLU have branched out in many different directions. They have been influenced by many recent areas such as information extraction, question answering and machine learning. Data driven approaches to understanding are rapidly gaining prominence.  There has been a substantial increase in interest in information extraction from the NLP community, question-answering in the information retrieval community, and spoken dialog systems in the speech processing community. Spoken language understanding is an especially attractive topic for cross-fertilization of ideas between speech, IR, and NLP communities.
Going beyond SLU and dialog systems, the second part of the workshop will address use of high-level knowledge for improved speech recognition accuracy.  The challenging robustness issues in speech recognition such as compensation for acoustic confusability resulting from noisy environments and unexpected channel and speaker mismatch can potentially be aided by the use of linguistic information such as prosody, syntax, semantics, and pragmatics and even high-level meta-information, such as personal information stored in a database or dialogue and pragmatic coherence constraints. However, current state-of-the-art speech recognizers do not explicitly use such information and rely mainly on information encoded in statistical N-gram language models. The papers here show the potential of high-level information to not only improve word accuracy but also to help disambiguate the recognized words, thus benefitting downstream processing and SLU in particular.
Invited Talks:
Renato  De Mori, Univ Avignon, France
Sentence Interpretation using Stochastic Finite State Transducers
Roberto Pieraccini, IBM TJ Watson Research Center, USA
Spoken Language Understanding: The Research/Industry Chasm
8:45-9:00 Welcome
9:00-9:50 Invited Talk: Sentence Interpretation using Stochastic
Finite State Transducers, Renato De Mori
9:50-10:00 Break
10:00-10:30 Hybrid Statistical and Structural Semantic Modeling for
Thai Multi-Stage Spoken Language Understanding, Chai Wutiwiwatchai and
Sadaoki Furui
10:30-11:00 Interactive Machine Learning Techniques for Improving SLU
Models, Lee Begeja, Bernard Renger, David Gibbon, Zhu Liu and Behzad
11:00-11:30 Virtual Modality: a Framework for Testing and Building
Multimodal Applications, Peter Pal Boda and Edward Filisko
11:30-12:00 Automatic Call Routing with Multiple Language Models,
Qiang Huang and Stephen Cox
12:00-1:00 Lunch
1:00-1:30  Error Detection  and Recovery  in Spoken  Dialogue Systems,
Edward Filisko and Stephanie Seneff
1:30-2:00 Robustness Issues in a Data-Driven Spoken Language
Understanding System, Yulan He and Steve Young
2:00-2:50 Invited Talk: Spoken Language Understanding: the
Research/Industry Chasm, Roberto Pieraccini
2:50-3:00 Break
3:00-3:30 Using Higher-level Linguistic Knowledge for Speech
Recognition Error Correction in a Spoken Q/A Dialog, Minwoo Jeong,
Byeongchang Kim and Gary Geunbae Lee
3:30-4:00 Speech Recognition Models of the Interdependence Among
Syntax, Prosody, and Segmental Acoustics, Mark Hasegawa-Johnson,
Jennifer Cole, Chilin Shih, Ken Chen, Aaron Cohen, Sandra Chavarria,
Heejin Kim, Taejin Yoon, Sarah Borys and Jeung-Yoon Choi
4:00-4:30 Modeling Prosodic Consistency for Automatic Speech
Recognition: Preliminary Investigations, Ernest Pusateri and James
4:30-5:00 Assigning Domains to Speech Recognition Hypotheses, Klaus
R�ggenmann and Iryna Gurevych
5:00-5:30 Context Sensing using Speech and Common Sense, Nathan Eagle
and Push Singh
Srinivas Bangalore, AT&T Labs - Research
Dilek Hakkani-T�r, AT&T Labs - Research
Gokhan Tur, AT&T Labs - Research
Yuqing Gao, IBM TJ Watson Research Center
Hong-Kwang Jeff Kuo, IBM TJ Watson Research Center
Andreas Stolcke, SRI & ICSI
Program Committee:
Frederic Bechet, Univ. of Avignon, France
Jerome Bellegarda, Apple Computer, USA
Jennifer Chu-Carroll, IBM TJ Watson Research Center, USA
Ciprian Chelba, Microsoft, USA
Stephen Cox, Univ. of East Anglia, UK
Sadaoki Furui, Tokyo Institute of Technology, Japan
Allen Gorin, AT&T Labs - Research, USA
Roberto Gretter, ITC-IRST, Italy
Julia Hirschberg, Columbia University, USA
Dan Jurafsky, University of Colorado, USA
Sanjeev Khudanpur, Johns Hopkins University, USA
Helen Meng, CUHK, Hong Kong
Prem Natarajan, BBN, USA
Hermann Ney, RWTH Aachen, Germany
Martha Palmer, University of Pennsylvania, USA
Barbara Peskin, ICSI, USA
Roberto Pieraccini, IBM TJ Watson Research Center, USA
Manny Rayner, NASA, USA
Brian Roark, AT&T Labs - Research, USA
Roni Rosenfeld, Carnegie Mellon University, USA
Stephanie Seneff, MIT, USA
Elizabeth Shriberg, SRI, USA
Amanda Stent, Stony Brook Univ., USA
Robust 2004: COST278 Workshop on

Robustness Issues in Conversational Interaction

August 30 and 31, 2004

University of East Anglia, Norwich, UK

A workshop on robustness issues for conversational interaction, organized by COST (European Cooperation in the field of Scientific and Technical Research) action 278, "Spoken Language Interaction in Telecommunication", will be held on August 30th and 31st, 2004 at the University of East Anglia, Norwich, UK.

The objective of this two day workshop is to bring together researchers from both universities and industry to consider different methods of achieving robustness in conversational interaction.

The workshop is aimed at robustness against all effects which are known to degrade the performance of each individual component  of a conversational interaction system.

Different approaches for compensating againt these effects will form the main theme of the the workshop.  A broad list of topics includes (not limited to):

In addition to regular technical sessions, the workshop will include invited plenary talks on topics of related general interest.  The workshop will be divided into four sessions during the two days and will conclude with a panel discussion.

Submission and further details
Prospective authors are invited to submit four-page papers describing original work in any of the areas relevant to the workshop.
Email enquiries can be sent to
Participation to the workshop will be restricted to around 50 people.

Important dates
Submission deadline: June 18th 2004
Notification of acceptance: July 9th 2004
Workshop: August 30th and 31st 2004

Rich Transcription 2004 Meeting Recognition Workshop

ICASSP 2004 in Montreal

May 17, 2004

NIST is conducting a community-wide evaluation of speech-based meeting recognition technologies in March and a 1-day workshop,  "Rich Transcription 2004 Meeting Recognition Workshop", on May 17 at ICASSP 2004 in Montreal.    While a portion of the workshop will be devoted to discussion of the results of the evaluation, the goal of the workshop is to provide an overview of the state-of-the-art in meeting recognition technologies and discuss plans for future work and collaborations. 

Huge efforts are being expended in mining information in newswire, news broadcasts, and conversational speech and in developing interfaces to metadata extracted in these domains. However, until recently, relatively little has been done to address such applications in the more challenging and equally important meeting domain.

The development of smart meeting room core technologies that can automatically recognize and extract important information from multi-media sensor inputs will provide an invaluable resource for a variety of business, academic, and governmental applications. Such metadata will provide the basis for the development of second-tier meeting applications that can automatically process, categorize, and index meetings. Third-tier applications will provide a context-aware collaborative interface between live meeting participants, remote participants, meeting archives and vast online resources.

The meeting domain has several important properties not found in other domains and which are not currently being focused on in other research programs: multiple forums and vocabularies, highly-interactive/simultaneous speech, multiple distant microphones, multiple camera views, and multi-media/multi-modal information integration.

The Rich Transcription 2004 Spring Meeting Recognition Workshop at ICASSP 2004 on May 17 in Montreal will bring together the community of researchers working in this new and challenging domain to discuss the challenges, the current state-of-the-art, and future plans and collaborations. Discussions will include the results of the March 2004 Rich Transcription Meeting Recognition Evaluation including both Speech-to-Text Transcription and Speaker Segmentation technologies, related research work in the meeting domain, related governmental programs, and future collaborations.

Workshop Participation

While RT-04 Spring Recognition Evaluation participants will have automatic slots in the workshop, researchers working in related areas (speech technologies, vision technologies, behavioral sciences, etc.) in the meeting domain will also present their work. Additionally, a certain number of non-presenters will be permitted to attend the workshop on an invited basis. Please contact us at if you are interested in attending. While a portion of the workshop will be devoted to discussion of the results of the evaluation, the goal of the workshop is to provide an overview of the state-of-the-art in meeting recognition technologies and discuss plans for future work and collaborations.


The RT-04 Spring Recognition Evaluation is part of the NIST Rich Transcription Evaluation series and will include both speaker segmentation and speech-to-text transcription tasks in the meeting domain. The test set will be approximately 90 minutes in length and will be comprised of 8˜11-minutes meeting exerpts collected at CMU, ICSI, the LDC, and NIST.

Colloquium in Honor of Ron Schafer

Georgia Institute of Technology,  Atlanta, Georgia

Friday, October 31, 2003 GCATT Building

Last fall a colloqium in honor of Ron Schafer's retirement was held at Georgia Tech.  The colloqium was hosted by Russ Mersereau.  The morning featured presentations by Ron's PhD thesis advisor Al Oppenheim, his colleagues Larry Rabiner from Bell Labs and Tom Barnwell from Georgia Tech, and former student Mark Smith:
The afternoon included a panel discussion on future directions for digital signal processing that was emceed by Fred Juang:
Larry Rabiner contributed photographs of taken at the colloquium:

Ron Schafer retirement
Ron Schafer and Larry Rabiner
The newly retired Ronald W. Schafer
Ron Schafer with Larry Rabiner

STC Newsletter archive photos of R.W. Schafer
and L. R. Rabiner.  Actually, the STC Newsletter
has no archive.  These were actually scanned from
the IEEE Transactions on Audio and Electroacoustics
R. W. Schafer staff photoL. R. Rabiner AT&T staff photo

Positions Available in the Speech Group at Eurecom

Postdoc Position Available at Eurecom:

The Speech group at Eurecom is looking for a Post-doc student who has acquired a hands-on practice of speech processing.  He/she must have an excellent practice of signal and speech analysis as well as a good knowledge of optimal classification using Bayesian criteria.  He/she must be open-minded to original solutions proposed after a rigourous analysis of the low level phenomena in speech processing.  Fluency in English is mandatory (write, understand, and speak).  He/she should be able to represent Eurecom at periodical meetings.  Ability to work in a small team is also required.

The position is associated with the European project DIVINES, a STREP/ 6th Frame Program.  The aim of the project is to analyse the reasons why recognizers are unable to reach the human recognition rates even in the case of lack of semantic content. All weaknesses will be analyzed at the level of feature extraction, phone and lexical models. Focus will be put on intrinsic variabilities of speech in quiet and noisy environment as well as in read and spontaneous speech. The analysis will not be restricted to tests on several databases with different features and models but will go into the detailed behavior of the algorithms and models. Suggestions of new solutions will arise and be experimented. The duration of the project is for 3 years.

Ph.D. Student Position Available at Eurecom

The Speech group at Eurecom is looking for a top level PhD student who has a good knowledge of speech processing. Preference is for a student who worked in speech in his/her predoctoral school or worked on a speech project for his graduation project. He/she must have an excellent practice of signal and speech analysis as well as a good knowledge of optimal classification using Bayesian criteria. Fluency in english is mandatory (write, understand and speak). Ability to work in a small team is also required.

Application Procedure:
-send a detailed resume (give details on your activity since your PhD graduation)
-send a copy of your thesis report (as a printed document or CDROM) DO NOT attach your thesis in an e-mail!
-send a copy of your diploma
-send the names and email addresses of two referees.
-send the list of your publications

Send all materials to Professor Chris J. Wellekens, Dept of Multimedia Communications, 2229 route des Cretes, BP 193, F-06904 Sophia Antipolis Cedex, France

Contact Professor Chris Wellekens at


