Speech and Language Processing Technical Committee Newsletter

August 2014

Welcome to the Fall 2014 edition of the IEEE Speech and Language Processing Technical Committee's Newsletter! This issue of the newsletter includes 7 articles and announcements from 13 contributors, including our own staff reporters and editors. Thank you all for your contributions! This issue includes news about IEEE journals and recent workshops, SLTC call for nominations, and individual contributions.

We believe the newsletter is an ideal forum for updates, reports, announcements and editorials which don't fit well with traditional journals. We welcome your contributions, as well as calls for papers, job announcements, comments and suggestions.

To subscribe to the Newsletter, send an email with the command "subscribe speechnewsdist" in the message body to listserv [at] listserv (dot) ieee [dot] org.

Florian Metze, Editor-in-chief
William Campbell, Editor
Haizhou Li, Editor
Patrick Nguyen, Editor


From the SLTC and IEEE

From the IEEE SLTC Chair

Douglas O'Shaughnessy


Articles

IEEE Spoken Language Technologies Workshop 2014

Julia Hirschberg and Agustin Gravano

The Fifth IEEE Workshop on Spoken Language Technology (SLT 2014) will be held in South Lake Tahoe, Nevada, on Dec 7-10, 2014. The main theme of the workshop will be "machine learning in spoken language technologies". SLT 2014 will include tutorials and keynote speeches on the main workshop theme and emerging areas; online panel discussions before/ during the conference; miniSIGs - small discussion groups; and highlight sessions, where the 3-5 best papers will be presented orally.


IEEE Automatic Speech Recognition and Understanding Workshop 2015

Pino Di Fabbrizio and Jason D. Williams

The 2015 edition of the IEEE Automatic Speech Recognition and Understanding (ASRU 2015) Workshop will be held in Scottsdale, Arizona (USA) from Sunday, December 13 to Thursday, December 17, 2015. Located in the beautiful Sonoran Desert, Scottsdale, Arizona is an ideal location to host ASRU 2014 providing a warm climate during December, spectacular natural scenery, and a vibrant downtown with easy access to the Phoenix metropolitan area.


ASRU 2015 - Call for Challenge Tasks

Michiel Bacchiani

ASRU 2015 welcomes proposals for challenge tasks. In a challenge task, participants compete or collaborate to accomplish a common or shared task. The results of the challenge will be presented at the ASRU workshop event in the form of papers reporting the achievements of the participants, individually and/or as a whole. We invite organizers to concretely propose such challenge tasks in the form of a 1-2 page proposal.


Report from the 2014 UKSpeech Conference

Korin Richmond

The third UKSpeech Conference was held recently in Edinburgh at the Informatics Forum, home of the School of Informatics at the University of Edinburgh (9-10th June 2014).


An Overview of New Small-Footprint Technology on Mobile Devices

Tara N. Sainath

The popularity of mobile devices has resulted with increased interaction with mobile devices using voice. In this newsletter, we will discuss some of the newest voice search technology released at Google, particularly focused on small-footprint applications. We will show that the use of deep neural network (DNN) technology has made it possible to perform tasks on voice inputs within the confinement of portable devices.


Congratulations to the new IEEE Fellows

Florian Metze

Each year, the IEEE Board of Directors confers the grade of Fellow on up to one-tenth of one percent of the members. The grade of Fellow recognizes unusual distinction in IEEE’s designated fields. We congratulate the following 6 speech and language processing colleagues who were recognized with the grade of Fellow as of 1 January 2014.


An Overview of the NIST i-Vector Machine Learning Challenge

Craig S. Greenberg, Désiré Bansé, George R. Doddington, Daniel Garcia-Romero, John J. Godfrey, Tomi Kinnunen, Alvin F. Martin, Alan McCree, Mark Przybocki, and Douglas A. Reynolds

In late 2013 and in 2014 the National Institute of Standards & Technology (NIST) coordinated an open, online machine learning challenge for speaker recognition, using speech data represented as i-vectors. This challenge utilized fixed front-end processing in order to allow direct comparison of different back-ends, encourage exploration of new ideas in machine learning for speaker recognition, and to make the field accessible to participants from outside the audio processing community.



From the SLTC Chair

Douglas O'Shaughnessy

SLTC Newsletter, August 2014

Welcome to the summer SLTC Newsletter. Allow me to draw your attention to the upcoming election to renew our Speech and Language Technical Committee (SLTC). As you may know, the SLTC represents all speech and language researchers in the IEEE Signal Processing Society (SPS), and supervises the review process for speech and language papers for ICASSP and for the ASRU and SLT workshops. The SLTC also gives advice on numerous issues that concern our research. It is effectively the main authoritative entity for technical matters dealing with speech and language. It is thus important to maintain a solid representation of our community in the SLTC from academia and industry around the world.

In September each year, the SLTC is renewed by the election of 17-18 new members, each for a 3-year term. (As the areas of speech and language constitute the largest single grouping in the SPS, the SLTC has the most members of any TC). If you know of someone who might wish to consider a position in the SLTC, or if you yourself would like to join, please contact the SLTC Election subcommittee (larry.heck@ieee.org, bhiksha@cs.cmu.edu, Fabrice.Lefevre@univ-avignon.fr, stolcke@icsi.berkeley.edu). Applying simply requires filling out a short form. In the election, current SLTC members (excluding those running for re-election to a second and final term) will vote to approve the new members from among the nominees. Anyone can nominate. Nominees must be current SPS members.

We look forward to the upcoming IEEE Spoken Language Technology (SLT) Workshop, to be held at held at Harvey’s Lake Tahoe Hotel in scenic Lake Tahoe, Nevada (Dec. 7-10, 2014; http://www.slt2014.org/). The main theme of the workshop is machine learning in spoken language technologies. With so many recent and upcoming speech and language meetings being outside the US, this is a golden opportunity to visit a beautiful American location and keep up-to-date technically at the same time.

For next year’s ICASSP (April 19-24, 2015 in Brisbane, Australia), special session proposals and tutorial proposals are both due August 17th and paper submissions will be due October 5th.

Next year’s IEEE Automatic Speech Recognition and Understanding (ASRU) Workshop will be held in Scottsdale, Arizona, Dec. 13-17, 2015 (this choice was recently made by the SLTC at its ICASSP-2014 meeting). Also, please note the upcoming Interspeech conference (http://interspeech2014.org/), to be held in Singapore, Sept. 14-18, 2014.

Finally, it is not too soon to start planning for SLT-2016. If you have any interest in helping to organize this important meeting, let us know. The SLTC will consider bids early next year.

In closing, please consider participating at SLT this year in Tahoe and at next year’s ICASSP in Brisbane. We look forward to meeting friends and colleagues at these exciting locations.

Best wishes,

Douglas O'Shaughnessy

Douglas O'Shaughnessy is the Chair of the Speech and Language Processing Technical Committee.


2014 Spoken Language Technology Workshop

Julia Hirschberg and Agustin Gravano

SLTC Newsletter, August 2014

December 7-10, 2014 - South Lake Tahoe, NV, USA

IEEE - IEEE Signal Processing Society

http://www.slt2014.org/ - Follow @SLT_2014

The Fifth IEEE Workshop on Spoken Language Technology (SLT 2014) will be held in South Lake Tahoe, Nevada, on Dec 7-10, 2014. The main theme of the workshop will be "machine learning in spoken language technologies". SLT 2014 will include tutorials and keynote speeches on the main workshop theme and emerging areas; online panel discussions before/during the conference; miniSIGs - small discussion groups; and highlight sessions, where the 3-5 best papers will be presented orally.

Call for Demos

SLT 2014 invites proposals for the Show & Tell Demo Session. Areas of interest include:

Traditional Topic Coverage: speech recognition and synthesis, spoken language understanding, spoken dialog systems, spoken document summarization, machine translation for speech, question answering from speech, speech data mining, spoken document retrieval, spoken language databases, speaker/language recognition, multimodal processing, human/computer interaction, assistive technologies, natural language processing, educational and healthcare applications.
Emerging Areas: large scale spoken language understanding, massive data resources for SLT, unsupervised methods in SLT, capturing and representing world knowledge in SLT, web search with SLT, SLT in social networks, multimedia applications, intelligent environments.

For information on Demo proposal submission, please visit www.slt2014.org/CallForDemos.asp.

Call for SIG Meetings

SLT 2014 will host special sessions and special interest group (SIG) meetings. Submission of proposals is encouraged in all areas of spoken language technology listed above, with emphasis on the workshop technical theme, "machine learning in spoken language technology". Organizers will be asked to write a summary report after the event to be posted on the website. Prospective organizers will have flexibility to decide on the format of the event (e.g., format can be as formal as having oral presentations, or as informal as having round-table discussions) as well as the technical level (e.g., topics can be technical or not). For more information on SIG proposal submission, please visit www.slt2014.org/SpecialSessions.asp.

South Lake Tahoe

SLT 2014 will take place in South Lake Tahoe, a small town located on the southern shore of Lake Tahoe, right on the California-Nevada state line and easily accessible by plane or car. The workshop venue will be Harvey's Lake Tahoe Casino/Hotel, located within walking distance from the town center and the Heavenly Ski Resort. 

Climate: The average temperature in South Lake Tahoe in December is between 17 and 44 F, with an average precipitation of 2.9 inches.

Social Activities: The SLT-2014 meeting will include both a welcome reception as well as a banquet as part of the program and registration fee. Such events are important to encourage discussion and informal exchanges.

Local amenities: There are many possible activities for accompanying persons and for those who have extra time before or after the meeting. Some examples include: skiing, golfing, a Lake Tahoe cruise and boat tours/activities, gaming, ice-skating, shopping, biking, hiking, fishing, horseback riding, balloon rides, helicopter tours, spa services, and more. Concerts and shows are available at the venue itself (Harvey’s Lake Tahoe Casino/Hotel). Shopping locations are also close by for those interested in a souvenir of historic Lake Tahoe.

Get Involved!

In addition to submitting papers and/or proposing/organizing SIG meetings, you can get involved in workshop organization in different ways: by nominating tutorial and keynote speakers (nominations@slt2014.org), or by volunteering to be part of workshop organization (volunteers@slt2014.org). Please visit www.slt2014.org for more details.

Important Dates

Demo submission: September 10, 2014
Notification of SS proposals (1st/2nd decision): June 15/ September 19, 2014
Notification of Demo acceptance: October 10, 2014
Early registration deadline: October 17, 2014
Special Interest Group (SIG) proposal submission: November 21, 2014
Workshop Date: December 7-10, 2014

Supported by:

Julia Hirschberg and Agustin Gravano are Publicity Chairs for SLT2014.


IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2015

Pino Di Fabbrizio and Jason D. Williams

SLTC Newsletter, August 2014

The 2015 edition of the IEEE Automatic Speech Recognition and Understanding (ASRU 2015) Workshop will be held in Scottsdale, Arizona (USA) from Sunday, December 13 to Thursday, December 17, 2015. Located in the beautiful Sonoran Desert, Scottsdale, Arizona is an ideal location to host ASRU 2014 providing a warm climate during December, spectacular natural scenery, and a vibrant downtown with easy access to the Phoenix metropolitan area.

    Photo courtesy of Scottsdale Convention & Visitors Bureau.

Venue

The workshop venue will be located at the Firesky Resort & Spa, less than a mile away from downtown Scottsdale. This resort venue has a 4-star rating with over 200 guestrooms and excellent facilities, including two swimming pools (one with sandy beach), outdoor lounges with firepits in the evenings, fitness center, and on-site spa. Every room has an outdoor patio or balcony. Additionally, it includes 1,300 square meters (14,000 square feet) of indoor meeting space, plus exceptional adjoining outdoor space, with a lagoon and courtyard gardens.

Technical program

The technical areas of interest will include the following topics:

The workshop format will have a single-track program with submitted papers presented as posters, 5 keynotes/invited talks, and 2 panels. This ASRU edition will be focusing on growing the technical program where it will benefit the community. We have started by issuing a call for challenge tasks. In addition, the program committee is actively evaluating other new additions to the program, such as tutorials, themed days, special sessions, and a hack-a-thon.

Schedule

Organizing committee

General chairs

Technical chairs

Regional publicity chairs

Finance chair

Sponsorship chair

Publication chair

Panel and invited speaker chairs

Demonstration chairs

Advisory board

Local arrangements

Pino Di Fabrizio and Jason D. Williams are General Chairs of ASRU 2015.


ASRU 2015 - Call for Challenge Tasks

Michiel Bacchiani

SLTC Newsletter, August 2014

ASRU 2015 welcomes proposals for challenge tasks. In a challenge task, participants compete or collaborate to accomplish a common or shared task. The results of the challenge will be presented at the ASRU workshop event in the form of papers reporting the achievements of the participants, individually and/or as a whole. We invite organizers to concretely propose such challenge tasks in the form of a 1-2 page proposal. The proposal should include a description of

Participants will report their achievements in the form of regular format paper submissions to the ASRU workshop. These submissions will undergo the normal ASRU review process, but the organizers can suggest reviewers that would be particularly insightful for the challenge subject matter. Accepted papers will be organized in a special session at the conference (in poster format; the only format used at ASRU). The accepted papers will appear in the ASRU proceedings.

Given the possibly lengthy process of organizing and executing a special challenge, prospective organizers are encouraged to submit proposals as soon as possible. The ASRU technical program committee will make acceptance decisions based on a rolling schedule -- i.e., proposals are reviewed as soon as they come in. Challenge proposals should be sent to Technical Program co-chair Michiel Bacchiani at michiel@google.com, and will be accepted until the end of 2014.

Michiel Bacchiani is Technical Program Co-Chair of ASRU 2015.


Report from the third UKSpeech Conference

Korin Richmond

SLTC Newsletter, August 2014

The third UKSpeech Conference was held recently in Edinburgh at the Informatics Forum, home of the School of Informatics at the University of Edinburgh (9-10th June 2014).

The event was organised by "UKSpeech", a group which aims to serve the speech science and technology community of the United Kingdom. Since formation in 2003, the UKSpeech group has organised multiple meetings for speech researchers in the UK, including for example a series of one-day Workshops for Young Researchers which has been well-received. The event in June was the third in a series of larger two-day meetings, following successful conferences hosted at the Universities of Birmingham (2012) and Cambridge (2013). The previous two conferences had provided a fantastic opportunity to keep up-to-date with speech research around the UK and to catch up with friends, both old and new. The conference in Edinburgh was no exception to this, with around 110 delegates from both academia and industry meeting at the award-winning environs of the Informatics Forum to enjoy a diverse programme of oral and poster sessions, as well as some fine refreshments and good company.

Day one began with a talk presented by Heiga Zen (Google, London) on the subject of statistical parametric speech synthesis. Heiga’s talk focussed in particular on work using neural networks, which are currently re-emerging as a promising method for acoustic modelling for speech synthesis. Later, Arnab Ghoshal (Apple Siri Team) presented a tutorial on the Kaldi Speech Recognition Toolkit, providing a comprehensive and enlightening overview, drawn from his experience as one of the founding members of this valuable open source project. Day two started with an invited talk on "Social Signal Processing" by Alessandro Vinciarelli (University of Glasgow). In this talk, Alessandro advocated analysis of speech-based interaction in terms of social signals (e.g. including non-verbal cues such as facial expressions, vocalisations, gestures etc.) in order to build systems for human-computer interaction that are socially aware, and so more successful and acceptable to users. For the final oral session, Naomi Harte (Trinity College Dublin) chaired an informal discussion on the subject of an "Academic Career Path", eliciting for the benefit of more junior delegates insights and advice from experienced academic panel members: Roger Moore (University of Sheffield), Patrick Naylor (Imperial College London) and Simon King (University of Edinburgh).

The conference programme included three poster sessions spread over the two days, with around sixty poster presentations in total. Emphasis at the UKSpeech Conference lay firmly on discussion and social interaction, and with no proceedings to be published, there was explicitly no requirement for presented work to be novel and/or complete. Accordingly, we saw a mix of research which had either already appeared recently at an international conference such as ICASSP or Interspeech, or which is due to appear soon (so giving us a "sneak preview"), as well as more preliminary work in progress. The rich diversity of exciting topics in the speech field was well represented in the poster sessions. For example, presented work ranged from large-vocabulary speech recognition to statistical parametric speech synthesis, from signal processing and speech enhancement to computational linguistics and lexicon induction, from interactive dialog systems to clinical and therapeutic applications. In order to maximise participation and interaction, posters were assigned randomly to the three non-themed and non-overlapping poster sessions. This worked well, with a good level of mingling spilling over into the liberally interspersed breaks for tea and lunch, at which we were provided with nice food and drinks (a vital component of any meeting!).

Overall, this was an enjoyable and stimulating conference which easily achieved its aim of supporting and promoting the cohesion of the UK speech community. We thank the organisers for their work, and look forward to the next UKSpeech Conference, anticipated in June 2015.

The organising committee for this year's UKSpeech Conference were: Cassia Valentini Botinhao (University of Edinburgh), Naomi Harte (Trinity College Dublin), Peter Jančovič (University of Birmingham) and Rogier van Dalen (University of Cambridge). Support for local arrangements was provided by Rasmus Dall (University of Edinburgh), and website support was provided by Mark Huckvale (University College London).

For more information about the UKSpeech Conferences, and UKSpeech in general, see http://www.ukspeech.org.uk.

Korin Richmond is with the University of Edinburgh.


An Overview of New Small-Footprint Technology on Mobile Devices

Tara N. Sainath

SLTC Newsletter, August 2014

Introduction

The popularity of mobile devices has resulted in increased interaction with mobile devices using voice. In this newsletter, we will discuss some of the newest voice search technology released at Google, particularly focused on small-footprint applications. We will show that the use of deep neural network (DNN) technology has made it possible to perform tasks on voice inputs within the confinement of portable devices.

Compact LVCSR on Mobile Devices

A limitation for running speech recognition on the server is that mobile network connections may be slow, intermittent, or perhaps even non-existent. Therefore, the need to build a recognition system on mobile devices is critical, but this requires having a small-footprint system that can run in real-time. To achieve this goal, [1] describes efforts in building compact acoustic and language models on mobile devices.

First, the DNN acoustic model size is reduced by using fewer output targets, more hidden layers and fewer hidden units/layer. In addition, various decoding speedups are used, including a fixed-point representation, batched lazy computation and frame skipping. Furthermore, memory usage is reduced during decoding by performing on-the-fly rescoring with a compressed language model.

Small-Footprint Keyword Spotting

To allow for users to have a completely hands-free experience, Google developed a system that would continuously listen for the phrase “Ok Google”, in order to initiate voice input [2]. The challenge of this project was to develop an algorithm that was highly accurate, low-latency and small-footprint to enable it to run on modern mobile devices.

A keyword Spotting (KWS) system was developed to detect “Ok Google”, occurring in the audio signal. In this work, a deep neural network was built with 3 output keywords, namely “Ok”, “Google”, and a filler keyword. Compared to previous KWS approaches which required Hidden Markov Models (HMMs), this approach does not require decoding, allowing for a simpler implementation with reduced runtime computation and a smaller memory footprint. In addition, the Deep KWS system offers improvements over the commonly used HMM-based system across many different environmental conditions.

Google has deployed keyword spotting technology for “Ok Google” and “Ok Glass” on Android phones, Chrome browser, iOS, Android-wear and Google Glass.

Small-Footprint Speaker Verification

[3] proposes a small footprint solution for speaker verification. Like any speaker verification system, the first stage is to build background models across all of the training data to define speaker manifolds. This paper introduces a DNN-based background model, which maps frame-level contextual features to a speaker identity target. Next during enrollment, a speaker model is defined from the activations out of the last hidden layer, which are referred to as “d-vectors”. Finally, during evaluation, the distance between target and test d-vectors is computed to verify a particular speaker.

The proposed d-vector approach for speaker verification outperforms using i-vectors for speaker verification, and is robust in clean conditions.

Language ID System

[4] investigates doing language identification (ID) using DNNs by mapping individual voiced frames to one of the language targets.

The authors compare the DNN approach for LID to a commonly used generative i-vector technique. They find that the DNN approach outperforms a strong i-vector system, offering a 5-times improvement on the Google 5M LID dataset (25 languages + 9 dialects). Further improvements can be seen with LSTMs compared to DNNs, using models of just 1/5th the size.

Acknowledgements

Thanks to Michiel Bacchiani, Alex Gruenstein, Ignacio Lopez-Moreno, Carolina Parada and Andrew Senior and of Google Research for useful discussions in helping to prepare this article.

References

  1. X. Lei, A. Senior, A. Gruenstein and J. Sorensen, “Accurate and Compact Large Vocabulary Speech Recognition on Mobile Devices,” in Proc. Interspeech, 2013
  2. G. Chen, C. Parada and G. Heigold, “Small-footprint Keyword Spotting Using Deep Neural Networks,” in Proc. ICASSP 2014.
  3. E. Variani, X. Lei, E. McDermott, I. Lopez-Moreno, J. Gonzalez-Dominguez, “Deep Neural Networks for Small Footprint Text-Dependent Speaker Verification,” in Proc. ICASSP, 2014.
  4. I. Lopez-Moreno, J. Gonzalez-Dominguez, O. Pichot, “Automatic Language Identification Using Deep Neural Networks,” in Proc. ICASSP, 2014.

Tara N. Sainath is a staff writer for the SLTC Newsletter. She is a Research Staff Member at Google, Inc. in New York. Her research interests are mainly in acoustic modeling. Email: tsainath@google.com


Six speech and language processing colleagues named IEEE Fellow in 2014

Florian Metze

SLTC Newsletter, August 2014

Each year, the IEEE Board of Directors confers the grade of Fellow on up to one-tenth of one percent of the members. The grade of Fellow recognizes unusual distinction in IEEE’s designated fields. We congratulate the following 6 speech and language processing colleagues who were recognized with the grade of Fellow as of 1 January 2014.

Florian Metze is Editor-in-Chief of the SLTC Newsletter.


An Overview of the NIST i-Vector Machine Learning Challenge

Craig S. Greenberg, Désiré Bansé, George R. Doddington, Daniel Garcia-Romero, John J. Godfrey, Tomi Kinnunen, Alvin F. Martin, Alan McCree, Mark Przybocki, and Douglas A. Reynolds

SLTC Newsletter, August 2014

Overview

In late 2013 and in 2014 the National Institute of Standards & Technology (NIST) coordinated an open, online machine learning challenge for speaker recognition, using speech data represented as i-vectors [1]. This challenge utilized fixed front-end processing in order to allow direct comparison of different back-ends, encourage exploration of new ideas in machine learning for speaker recognition, and to make the field accessible to participants from outside the audio processing community.

Participation

Expectations for this challenge were surpassed, as 140 participants, coming from 105 unique sites, submitted a combined total of 8,192 systems outputs, and several top performing systems demonstrated impressive improvement over the provided baseline system. The participants represented 47 different countries, the greatest number coming from the USA (67), China (36), Russia (21), and India (18). The challenge saw major increases in the numbers of participants and of submitted system outputs compared with the most recent NIST Speaker Recognition Evaluation: nearly twofold and two orders of magnitude increases, respectively. These increases suggest that the i-Vector Challenge was successful in reducing the barrier of participation.

Performance Results

At the end of the official scoring period, the baseline system ranked 105th out of 140, meaning approximately 75% of challenge participants submitted a system that outperformed the baseline. The leading system at that time demonstrated an approximate 40% relative improvement over the baseline. The challenge included cross-sex trials, and it is interesting to note that the observed system performance on same-sex trials compared with cross-sex trials was more similar than expected.

Workshop

The above results and more were presented at the 2014 Odyssey Speaker and Language Recognition Workshop during a special session dedicated to the i-Vector Challenge. The session included five presentations from challenge participants as well as a presentation given by NIST. After the presentations, there was a vibrant, hour-long discussion, which focused on how this challenge approach to audio technology development may be expanded in the future (with discussions continuing further at a Finnish lakeside sauna and during an excursion to Koli National Park as a part of the Odyssey workshop program).

Ongoing Efforts and Future Plans

As a result of the Odyssey Workshop special session discussion, NIST launched a 2nd phase of the i-Vector Challenge which, unlike the 1st phase, provided speaker labels for system development data. The 2nd phase began in July, 2014 and will be open through September, 2014.

NIST plans to host similar challenges in the future, including an i-vector challenge for language recognition, intended to take place in late 2014 and 2015. The current i-vector challenge for speaker recognition data and scoring platform will remain available online for experimentation for the foreseeable future.

More Information

For more information about the organization of the challenge, see the challenge plan, and for more performance results, see [2]. To conduct your own i-vector machine learning challenge experiments, visit the challenge platform.

If you have comments, corrections, or additions to this article, please contact: ivector_poc@nist.gov.

Bibliography

  1. N. Dehak, P. J. Kenny, R. Dehak, P. Dumouchel and P. Ouellet, "Front-End Factor Analysis for Speaker Verification," IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, vol. 19, no. 4, pp. 788 - 798, MAY 2011.
  2. C. S. Greenberg, D. Bansé, G. R. Doddington, D. Garcia-Romero, J. J. Godfrey, T. Kinnunen, A. F. Martin, A. McCree, M. Przybocki and D. A. Reynolds, "The NIST 2014 Speaker Recognition i-Vector Machine Learning," in Odyssey: The Speaker and Language Recognition Workshop, Joensuu, Finland, 2014.

The authors are with the National Institute of Standards and Technology (NIST), the Johns Hopkins Human Language Technology Center of Excellence, the University of Eastern Finland, and MIT Lincoln Laboratory.