Skip to main content

Speech and Language Processing

SLTC

Postdoctoral Position in Pathological Speech Processing

Title: Sparse predictive models for the analysis and classification of pathological speech

Duration: from 01/11/2021 to 31/12/2022 (could be extended to an advanced position)

Required Knowledge and background: A solid knowledge in speech/signal processing; A good mathematical background; Basics of machine learning; Programming in Matlab and Python.

Application and more information : https://jobs.inria.fr/public/classic/en/offres/2021-03570

Context and objectives : During this century, there has been an ever increasing interest in the development of objective vocal biomarkers to assist in diagnosis and monitoring of neurodegenerative diseases and, recently, respiratory diseases because of the Covid-19 pandemic. The literature is now relatively rich in methods for objective analysis of dysarthria, a class of motor speech disorders [1], where most of the effort has been made on speech impaired by Parkinson’s disease. However, relatively few studies have addressed the challenging problem of discrimination between subgroups of Parkinsonian disorders which share similar clinical symptoms, particularly is early disease stages [2]. As for the analysis of speech impaired by respiratory diseases, the field is relatively new (with existing developments in very specialized areas) but is taking a great attention since the beginning of the pandemic.

On the other hand, the large majority of existing processing methods (of pathological speech in general) still heavily rely on a core of feature estimators designed and optimized for healthy speech. There exist thus a strong need for a framework to infer/design speech features and cues which remain robust to the perturbations caused by (classes of) disordered speech. The first and main objective of this proposal is to explore the framework of sparse modeling of speech which allow a certain flexibility in the design and parameter estimation of the source-filter model of speech production. This exploration will be essentially based on theoretical advances developed by the GEOSTAT team and which have led to a significant impact in the field of image processing, not only at the scientific level [3] but also at the technological level (www.inria.fr/fr/i2s-geostat-un-innovation-lab-en-imagerie-numerique).

The second objective of this proposal is to use the resulting representations as inputs to basic machine learning algorithms in order to conceive a vocal biomarker to assist in the discrimination between subgroups of Parkinsonian disorders (Parkinson’s disease, Multiple-System Atrophy, Progressive Supranuclear Palsy) and in the monitoring of respiratory diseases (Covid-19, Asthma, COPD).

Both objectives benefit from a rich dataset of speech and other biosignals recently collected in the framework of two clinical studies in partnership with university hospitals in Bordeaux and Toulouse (for Parkinsonian disorders) and in Paris (for respiratory diseases).

References:

[1] J. Duffy. Motor Speech Disorders Substrates, Differential Diagnosis, and Management. Elsevier, 2013.

[2] J. Rusz et al. Speech disorders reflect differing pathophysiology in Parkinson's disease, progressive supranuclear palsy and multiple system atrophy. Journal of Neurology, 262(4), 2015.

[3] H. Badri. Sparse and Scale-Invariant Methods in Image Processing. PhD thesis, University of Bordeaux, France, 2015.

Read more

Speech Research & Development Engineer

Digital Voice Systems, Inc. (DVSI) is seeking a qualified Speech Research & Development Engineer at our office in Westford, MA.  This is a great opportunity to join our team of world class engineers in designing high quality voice compression technology that is implemented in hundreds of millions of telecommunication systems world-wide.

The ideal candidate will play a key role in the research and development of DVSI's next generation of digital speech compression technology including speech analysis; speech modeling; model parameter estimation; quantization; speech synthesis; error correction and mitigation methods; as well as, echo cancellation and noise reduction.

Desired Qualifications

•  Research and development experience in speech or audio

•  Knowledge of programming languages, i.e. C/C++, Matlab etc.

•  PhD (or equivalent) in Electrical Engineering or Software Engineering with an emphasis in Signal Processing

•  U.S. Citizenship or Permanent Residency required

Compensation

•  Competitive salary

•  Benefits package

•  Excellent working environment

Company Background

Founded in 1988, Digital Voice Systems, Inc. (DVSI) is the world leader in the development of low data rate, high-quality speech compression products for use in digital mobile radio, satellite and other wireless communication systems. DVSI’s patented line of Multi-Band Excitation vocoders have been successfully implemented in a full range of private and standards-based digital communication systems worldwide.

Read more

Researchers in Speech, Text and Multimodal Machine Translation @ DFKI Saarbrücken, Germany

--------------------------------------------------------------
Researchers in Speech, Text and Multimodal Machine Translation at DFKI Saarbrücken, Germany
--------------------------------------------------------------

The MT group at ML at DFKI Saarbrücken is looking for

    senior researchers/researchers/junior researchers

in speech, text and multimodal machine translation using deep learning.

3 year contracts. Possibility of extension. Ideal starting dates around June/July 2021.

Key responsibilities:
- Research and development in speech, text and multimodal MT
- Scientific publications
- Co-supervision of BSc/MSc students and research assistants
- Possibility of teaching at Saarland University (UdS)
- Senior: PhD co-supervision
- Senior: Project/grant acquisition and management

Qualifications senior researchers/researchers:
- PhD in NLP/Speech/MT/ML/CS or related
- strong scientific and publication track record in speech/text/multimodal-NLP/MT

Qualifications junior researchers:
- MSc in CS/NLP/Speech/ML/MT or related (possibility to do a PhD at DFKI/UdS)

All:
- Strong background in machine learning and deep learning
- Strong problem solving and programming skills
- Strong communication skills in written and spoken English (German an asset, but not a requirement)

Working environment: the posts are in the “Multilinguality and Language Technology” MLT Lab at DFKI (the German Research Center for Artificial Intelligence https://www.dfki.de/en/web/) in Saarbrücken, Germany. MLT is led by Prof. Josef van Genabith. MLT is a highly international team and does basic and applied research.

Application: a short cover letter indicating which level (senior / researcher / junior) you apply for, a CV, a brief summary of research interests, and contact information for three references. Please submit your application by Friday April 23rd, 2021 as PDF to Prof. Josef van Genabith (josef.van_genabith@dfki.de) indicating your earliest possible start date.  Positions remain open until filled.

Selected MT at MLT group publications 2020/21: Xu et al. Probing Word Translation in the Transformer and Trading Decoder for Encoder Layers. NAACL-HLT 2021. Chowdhury et al. Understanding Translationese in Multi-View Embedding Spaces. COLING 2020. Pal et al. The Transference Architecture for Automatic Post-Editing. COLING 2020. Ruiter et al. Self-Induced Curriculum Learning in Self-Supervised Neural Machine Translation. EMNLP-2020. Zhang et al. Translation Quality Estimation by Jointly Learning to Score and Rank. EMNLP 2020. Xu et al. Dynamically Adjusting Transformer Batch Size by Monitoring Gradient Direction Change. ACL 2020. Xu et al. Learning Source Phrase Representations for Neural Machine Translation. ACL 2020. Xu et al. Lipschitz Constrained Parameter Initialization for Deep Transformers. ACL 2020. Herbig et al. MMPE: A Multi-Modal Interface for Post-Editing Machine Translation. ACL 2020. Herbig et al. MMPE: A Multi-Modal Interface using Handwriting, Touch Reordering and Speech Commands for Post-Editing Machine Translation. ACL 2020. Alabi et al. Massive vs. Curated Embeddings for Low-Resourced Languages: the Case of Yorùbá and Twi. LREC 2020. Costa-jussàet al. Multilingual and Interlingual Semantic Representations for Natural Language Processing: A Brief Introduction. In: Computational Linguistics (CL) Special Issue: Multilingual and Interlingual Semantic Representations for Natural Language Processing. Xu et al. Efficient Context-Aware Neural Machine Translation with Layer-Wise Weighting and Input-Aware Gating. IJCAI 2020

DFKI is one of the leading AI centers worldwide, with several sites in Germany. DFKI Saarbrücken is part of the Saarland University (UdS) Informatics Campus. UdS has exceptionally strong CS and CL schools and, in addition to DFKI, a Max Plank Institute for Informatics, a Max Plank Institute for Software Systems, the Center for Bioinformatics, and the CISPA Helmholz Center for Information Security.

Geographic environment: Saarbrücken  (http://www.saarbruecken.de/en) is the capital of Saarland, one of the Federal States in Germany, located right in the heart of Europe and a cultural center in the border region of Germany, France and Luxembourg. Frankfurt and Paris are less than 2 hours by train. Living cost is moderate in comparison with other cities in Germany and Europe.

Read more

3 year Postdoc Position (E14) "Machine Learning for Speech and Audio Processing" at Universität Hamburg, Germany

The Signal Processing (SP) research group at the Universität Hamburg in Germany is hiring a Postdoc (E13/E14) "Machine Learning for Speech and Audio Processing".

The general focus of the Signal Processing (SP) research group is on developing novel signal processing and machine learning methods for speech and multimodal signals. Applications include speech communication devices such as hearing aids and voice-controlled assistants. The research associate will do research on novel signal processing and machine learning methods applied to speech and multimodal signals. Furthermore the research associate will help establishing degree programs in the data science context.

Please find the full job announcement with all details here
https://www.inf.uni-hamburg.de/en/inst/ab/sp/job-offer.html

-- Web: http://uhh.de/inf-sp YouTube: https://www.youtube.com/channel/UCsC4bz4A6mdkktO_eyCDraw

Read more

PhD Position

************ PhD position at Inria (Nancy - Grand Est), France **************

(More information: https://jobs.inria.fr/public/classic/en/offres/2021-03399)

Title: Robust and Generalizable Deep Learning-based Audio-visual Speech Enhancement

The PhD thesis will be jointly supervised by Mostafa Sadeghi (Inria Starting Faculty Position) and Romain Serizel (Associate Professor, Université de Lorraine) in the MULTISPEECH Team at Inria, Nancy - Grand Est, France.

Contacts: Mostafa Sadeghi (mostafa.sadeghi@inria.fr) and Romain Serizel (romain.serizel@loria.fr)

Context: Audio-visual speech enhancement (AVSE) refers to the task of improving the intelligibility and quality of a noisy speech utilizing the complementary information of visual modality (lips movements of the speaker) [1]. Visual modality can help distinguish target speech from background sounds especially in highly noisy environments. Recently, and due to the great success and progress of deep neural network (DNN) architectures, AVSE has been extensively revisited. Existing DNN-based AVSE methods are categorized into supervised and unsupervised approaches. In the former category, a DNN is trained to map noisy speech and the associated video frames of the speaker into a clean estimate of the target speech. The unsupervised methods [2] follow a traditional maximum likelihood-based approach combined with the expressive power of DNNs. Specifically, the prior distribution of clean speech is learned using deep generative models such as variational autoencoders (VAEs) and combined with a likelihood function based on, e.g., non-negative matrix factorization (NMF), to estimate the clean speech in a probabilistic way. As there is no training on noisy speech, this approach is unsupervised.

Supervised methods require deep networks, with millions of parameters, as well as a large audio-visual dataset with diverse enough noise instances to be robust against acoustic noise. There is also no systematic way to achieve robustness to visual noise, e.g., head movements, face occlusions, changing illumination conditions, etc. Unsupervised methods, on the other hand, show a better generalization performance and can achieve robustness to visual noise thanks to their probabilistic nature [3]. Nevertheless, their test phase involves a computationally demanding iterative process, hindering their practical use.

Objectives: Project description: In this PhD project, we are going to bridge the gap between supervised and unsupervised approaches, benefiting from both worlds. The central task of this project is to design and implement a unified AVSE framework having the following features: 1- Robustness to visual noise, 2- Good generalization to unseen noise environments, and 3- Computational efficiency at test time. To achieve the first objective, various techniques will be investigated, including probabilistic switching (gating) mechanisms [3], face frontalization [4], and data augmentation [5]. The main idea is to adaptively lower bound the performance by that of audio-only speech enhancement when the visual modality is not reliable. To accomplish the second objective, we will explore techniques such as acoustic scene classification combined with noise modeling inspired by unsupervised AVSE, in order to adaptively switch to different noise models during speech enhancement. Finally, concerning the third objective, lightweight inference methods, as well as efficient generative models, will be developed. We will work with the AVSpeech [6] and TCD-TIMIT [7] audio-visual speech corpora.

References:

[1] D. Michelsanti, Z. H. Tan, S. X. Zhang, Y. Xu, M. Yu, D. Yu, and J. Jensen, “An overview of deep-learning based audio-visual speech enhancement and separation,” arXiv: 2008.09586, 2020.

[2] M. Sadeghi, S. Leglaive, X. Alameda-Pineda, L. Girin, and R. Horaud, “Audio-visual speech enhancement using conditional variational auto-encoders,” IEEE/ACM Transactions on Audio, Speech and Language Processing, vol. 28, pp. 1788 –1800, 2020.

[3] M. Sadeghi and X. Alameda-Pineda, “Switching variational autoencoders for noise-agnostic audio-visual speech enhancement,” in ICASSP, 2021.

[4] Z. Kang, M. Sadeghi, R. Horaud, “Face Frontalization Based on Robustly Fitting a Deformable Shape Model to 3D  Landmarks,” arXiv: 2010.13676, 2020.

[5] S. Cheng, P. Ma, G. Tzimiropoulos, S. Petridis, A. Bulat, J. Shen, M. Pantic, “Towards Pose-invariant Lip Reading,”  in ICASSP, 2020.

[6] A. Ephrat, I. Mosseri, O. Lang, T. Dekel, K. Wilson, A. Hassidim, W.T. Freeman, M. Rubinstein, “Looking to Listen  at the Cocktail Party: A Speaker-Independent Audio-Visual Model for Speech Separation,” SIGGRAPH 2018.

[7] N. Harte and E. Gillen, “TCD-TIMIT: An Audio-Visual Corpus of Continuous Speech,” IEEE Transactions on Multimedia, vol.17, no.5, pp.603-615, May 2015.

Skills:

  • Master's degree, or equivalent, in the field of speech/audio processing, computer vision, machine learning, or in a related field,
  • Ability to work independently as well as in a team,
  • Solid programming skills (Python, PyTorch),
  • A decent level of written and spoken English.

Benefits package:

  • Subsidized meals
  • Partial reimbursement of public transport costs
  • Leave: 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.)
  • Possibility of teleworking (after 6 months of employment) and flexible organization of working hours
  • Professional equipment available (videoconferencing, loan of computer equipment, etc.)
  • Social, cultural, and sports events and activities
  • Access to vocational training
  • Social security coverage

Remuneration:

Salary: 1982€ gross/month for 1st and 2nd year. 2085€ gross/month for 3rd year.

Monthly salary after taxes: around 1596,05€ for 1st and 2nd year. 1678,99€ for 3rd year. (medical insurance included).

Apply

Read more

PhD Position Speech and Audio Signal Processing

The Signal Processing (SP) research group at the Universität Hamburg in Germany is hiring a Research Associate / PhD student in the field of Speech and Audio Signal Processing.

The general focus of the Signal Processing (SP) research group is on developing novel methods for processing speech and audio signals with applications in speech communication devices such as hearing aids, mobile telephony, and voice-controlled assistants. Typically, the performance of these devices drops drastically when interfering sources, noise, and/or reverberation are present. The goal of the candidate is to develop novel methods to enable or facilitate speech communication and voice control in such acoustically challenging scenarios. In this context, possible PhD topics include source separation, source localization, speech enhancement and multimodal signal processing. Typical methods include statistical modeling and modern machine learning methods such as deep neural networks.

Please find the full announcement here
https://www.inf.uni-hamburg.de/en/inst/ab/sp/job-offer.html

Read more

Postdoctoral Position in Mobile Sensing and Child Mental Health

Postdoctoral position in Mobile Sensing and Child Mental Health

Beckman Institute for Advanced Science & Technology

University of Illinois at Urbana-Champaign

Our interdisciplinary research team at the Beckman Institute for Advance Science and Technology is developing and applying innovative tools and methods from mobile sensing, signal processing and machine learning to gain insight into the dynamic processes underlying the emergence of disturbances in child mental health. We have engineered a wearable sensing platform that captures speech, motion, and physiological signals of infants and young children in their natural environments, and we are applying data-driven machine-learning approaches and dynamic statistical modeling techniques to large-scale, naturalistic, and longitudinal data sets to characterize dynamic child-parent transactions and children’s developing stress regulatory capacities and to ultimately capture reliable biomarkers of child mental health disturbance.

We seek outstanding candidates for a postdoctoral scholar position that combines multimodal sensing and signal processing, dynamic systems modeling, and child mental health. The ideal candidate would have expertise in one or more of the following domains related to wearable sensors:

·         signal processing of audio, motion or physiological data

·         statistical modeling of multivariate time series data

·         behavioral signal processing/non-speech vocalization processing

·         mobile health interventions including wearable sensors

·         activity recognition

·         machine learning

·         digital phenotyping

In addition to joining a highly interdisciplinary team and making contributions to high impact research on mobile sensing and child mental health, this position provides strong opportunities for professional development and mentorship by faculty team leaders, including Drs. Mark Hasegawa-Johnson, Romit Roy Choudhury, and Nancy McElwain. In collaboration with the larger team, the postdoctoral scholar will play a central role in preparing conference papers and manuscripts for publication, contributing to the preparation of future grant proposals, and assisting with further development of our mobile sensing platform for use with infants and young children.

Applicants should have a doctoral degree in computer engineering, computer science, or a field related to data analytics of wearable sensors, as well as excellent skills in programming, communication, and writing. Appointment is for at least two years, contingent on first-year performance. The position start date is negotiable.

Please send a cover letter and CV to Drs. Mark Hasegawa-Johnson (jhasegaw@illinois.edu) and Nancy McElwain (mcelwn@illinois.edu). Applications will be considered until the position is filled, with priority given to applications submitted by November 15th.

Read more

Research Engineer - Speech Technology

The Speech Technology Group of Toshiba Europe LTD in Cambridge has opening for an ASR researcher. We are looking for candidates with background in signal processing, machine learning, acoustic modelling or expertise in building state-of-the-art systems for ASR.  The candidate should have a PhD in areas of speech technology related to automatic speech recognition, machine learning or a related field (Post-doctoral/industrial experience is beneficial). 

Please check for more details at:  https://careers.toshiba.eu/displayjob.aspx?jobid=132

Read more

Speech recognition expert and Speech synthesis experts

Reykjavík University’s Language and Voice Lab (https://lvl.ru.is) is looking for experts in speech recognition and in speech synthesis. At the LVL you will be joining a research team working on exciting developments in language technology as a part of the Icelandic Language Technology Programme (https://arxiv.org/pdf/2003.09244.pdf).

Job Duties:

  • Conduct independent research in the fields of speech processing, machine learning, speech recognition/synthesis and human-computer interaction.
  • Work with a team of other experts in carrying out the Speech Recognition/Synthesis part of the Icelandic Language Technology Programme.
  • Publish and disseminate research findings in journals and present at conferences.
  • Actively take part in scientific and industrial cooperation projects.
  • Assist in supervising Bachelor’s/Master’s students.

Skills:

  • MSc/PhD degree in engineering, computer science, statistics, mathematics or similar. Good programming skills (e.g. C++ and Python) and knowledge of Linux (necessary).
  • Good knowledge of a deep learning library such as PyTorch or TensorFlow (necessary).
  • Good knowledge of KALDI (preferable). Background in language technology (preferable).
  • Good skills in writing and understanding shell scripts (preferable).

All applications must be accompanied by a good CV with information about previous jobs, education, references etc. It is also optional to attach a cover letter where the applicant can justify the reasons for being the right person for the job.

Applications deadline is October 4th 2020. Applications are only accepted through RU's recruitment system - Link here. All inquiries and applications will be treated as confidential.

Further information about the job is provided by Jón Guðnason Associate Professor, jg@ru.is, and Ester Gústavsdóttir, Director of Human Resources, esterg@ru.is.

The role of Reykjavik University is to create and disseminate knowledge to enhance the competitiveness and quality of life for individuals and society, guided by good ethics, sustainability and responsibility. Education and research at RU are based on strong ties with industry and society. We emphasize interdisciplinary collaboration, international relations and entrepreneurship.

Read more

Marie Sklodowska-Curie PhD position

Applicants are invited for two Early Stage Researchers (ESR) of EU Marie Sklodowska-Curie Actions (MSCA) to be hosted in the Istituto Italiano di Tecnologia (IIT). The positions are for a fixed-term of 3 years and the successful applicants are expected to register for PhD at the University of Ferrara, in Translational Neurosciences and Neurotechnologies. The two Early Stage Researchers (ESR) of COBRA H2020 project (G.A. N. 859588), an EU Innovative Training Network of MSCA involving 9 partners, which will train a group of 15 researchers that will be the next generation of researchers to accurately characterize and model the linguistic, cognitive and brain mechanisms deployed by human speakers in conversational interactions with human interlocutors as well as artificial dialog systems.

Contact: alessandro.dausilio@iit.it

ERS2: When people are engaged in meaningful social interaction, they automatically and implicitly adjust their speech, vocal patterns and gestures to accommodate to others. Although these processes have extensively been explored at the behavioral level, very little is known about their neural underpinnings. Prior investigations have shown that suppression of alpha oscillations, overlaying sensorimotor regions, are a possible marker of action-perception coupling during non-speech (Tognoli & Kelso, 2015) and speech based (Mukherjee et al., 2019) interactive tasks. The project, by running dual-EEG recordings, will investigate if behavioral speech alignment translates into identifiable brain oscillatory markers. Key objectives are (i) to develop and validate metrics to quantify phonetic accommodation during natural speech interactions and (ii) to identify electrophysiological markers of between-speaker convergence.

ERS10: For adults, mastering the segmental and supra-segmental aspects of a second language (L2) is particularly challenging. Although we know that such a capability is partially maintained during adulthood, we do not know yet how to facilitate effective and long-lasting L2 learning. This project is based on the hypothesis that when people engage in meaningful social interactions, they automatically and implicitly align at multiple levels (Pickering, Garrod, 2013), including the phonetic (Mukherjee et al., 2019) and the facial expression ones. ESR10 will tackle the fundamental scientific question of speech alignment in L2 and whether it drives long-lasting improvements in L2 skills. Key objectives are (i) to investigate the dynamics of alignment in L2 (English) and (ii) to quantify improvements when participants are engaged in a conversation with native speakers.

THE APPLICATION IS DONE VIA THE COBRA WEBPAGE: https://www.cobra-network.eu/

Read more