Skip to main content

Speech and Language Processing

SLTC

PhD Position in Deep Cascaded Representation Learning for Speech Modelling

The LivePerson Centre for Speech and Language offers a 3 year fully funded PhD studentship covering standard maintenance, fees and travel support, to work on cascaded deep learning structures to model speech. The Centre is connected with the Speech and Hearing (SpandH) and the Natural Language Processing (NLP) research groups in the Department of Computer Science at the University of Sheffield.

Auto-encoding is a powerful concept that allows us to compress signals and find essential representations. The concept was expanded to include context, which is usually referred to as self-supervised learning. On very large amounts of speech data this has led to very successful methods and models for representing speech data, for a wide range of downstream processes. Examples of such models are Wave2Vec or WaveLM. Use of their representations often requires fine-tuning to a specific task, with small amounts of data. When encoding speech, it is desirable to represent a range of attributes at different temporal specificity. Such attributes often reflect a hierarchy of information.

The aim in this PhD project is to explore the use of knowledge about natural hierarchies in speech in cascaded auto- and contextual encoder/decoder models. The objective is to describe a structured way to understand such hierarchies. The successful candidate is expected to propose methods to combine different kinds of supervision (auto, context, label) and build hierarchies of embeddings extractions. These propositions may have to be seen in the context of data availability and complexity. All proposals are to be implemented and tested on speech data. Experiments should be conducted on a range of speech data sets with different speech types and data set size.

The student will join a world-leading team of researchers in speech and language technology. The LivePerson Centre for Speech and Language Technology was established in 2017 with the aim to conduct research into novel methods for speech recognition and general speech processing, including end to end modelling, direct waveform modelling and new approaches to modelling of acoustics and language. It has recently extended its research remit to spoken and written dialogue. The Centre hosts several Research Associates, PhD researchers, graduate and undergraduate project students, Researchers and Engineers from LivePerson, and academic visitors. Being fully connected with SpandH brings collaboration, and access to a wide range of academic research and opportunities for collaboration inside and outside of the University. The Centre has access to extensive dedicated computing resources (GPU, large storage) and local storage of over 60TB of raw speech data.

The successful applicant will work under the supervision of Prof. Hain who is the Director of the LivePerson Centre and also Head of the SpandH research group. SpandH was and is involved in a large number of national and international projects funded by national bodies and EU sources as well as industry. Prof. Hain also leads the UKRI Centre for Doctoral Training In Speech and Language Technologies and their Applications (https://slt-cdt.ac.uk/) - a collaboration between the NLP research group and SpandH. Jointly, NLP and SpandH host more than 110 active researchers in these fields. This project will start as soon as possible.

All applications must be made directly to the University of Sheffield using the Postgraduate Online Application Form. Information on what documents are required and a link to the application form can be found here - https://www.sheffield.ac.uk/postgraduate/phd/apply/applying On your application, please name Prof. Thomas Hain as your proposed supervisor and include the title of the studentship you wish to apply for.

Your research proposal should:

  • Be no longer than 4 A4 pages, including references
  • Outline your reasons for applying for this studentship
  • Explain how you would approach the research, including details of your
  • skills and experience in the topic area

This position is fully funded by LivePerson, covering all tuition fees and a stipend at
the standard UKRI rate.

Read more

PhD Position in Adaptive Deep Learning for Speech and Language

The LivePerson Centre for Speech and Language offers a 3 year fully funded PhD studentship
covering standard maintenance, fees and travel support, to work on deep neural network adaptive
learning modules for speech and language. The Centre is connected with the Speech and Hearing
(SpandH) and the Natural Language Processing (NLP) research groups at the Department of
Computer Science at the University of Sheffield.

Domain mismatch remains a key issue for speech and language technologies for which traditional
solutions are transfer learning and adaptation. The latter was widely used for modelling of speech in
the context of generative models, however less so with modern neural network approaches. Such
adaptation targeted features or models and was often informed by previous model output and
estimates of latent factors. These approaches were often informed by observations on human abilities
to adapt and adjust to new acoustic or semantic situations. Adaptation in neural networks is model
based and often implicit - through attention or dynamic convolution. However, these methods to date
still fail to reproduce the rapid learning and adaptation that humans exhibit when being exposed to new
contexts.

The objective in this project is to conduct research into neural network structures that are capable of
rapidly adjusting to a change in latent factors and at the same time allow for robust control. This will
require rapid feedback mechanisms on the mismatch between the observed data and the model
expectation. A range of strategies may be applied - through instantaneous feedback or through control
of transformational model parameters. All proposals are to be implemented and tested on speech, and
where suitable, also language data. Experiments should be conducted on a range of tasks of different
complexity in the context of different data types.

The student will join a world-leading team of researchers in speech and language technology. The
LivePerson Centre for Speech and Language Technology was established in 2017 with the aim to
conduct research into novel methods for speech recognition and general speech processing, including
end-to-end modelling, direct waveform modelling and new approaches to modelling of acoustics and
language. It has recently extended its research remit to spoken and written dialogue. The Centre hosts
several Research Associates, PhD researchers, graduate and undergraduate project students,
Researchers and Engineers from LivePerson, and academic visitors. Being fully connected with
SpandH brings collaboration, and access to a wide range of academic research and opportunities for
collaboration inside and outside of the University. The Centre has access to extensive dedicated
computing resources (GPU, large storage) and local storage of over 60TB of raw speech data.

The successful applicant will work under the supervision of Prof. Hain who is the Director of the
LivePerson Centre and also Head of the SpandH research group. SpandH was and is involved in a
large number of national and international projects funded by national bodies and EU sources as well
as industry. Prof. Hain also leads the UKRI Centre for Doctoral Training In Speech and Language
Technologies and their Applications (https://slt-cdt.ac.uk/) - a collaboration between the NLP research
group and SpandH. Jointly, NLP and SpandH host more than 110 active researchers in these fields.

This project will start as soon as possible.

How to Apply:

All applications must be made directly to the University of Sheffield using the

Postgraduate Online Application Form.

Information on what documents are required and a link to the application form can
be found here - https://www.sheffield.ac.uk/postgraduate/phd/apply/applying
On your application, please name Prof. Thomas Hain as your proposed supervisor
and include the title of the studentship you wish to apply for.
Your research proposal should:

  • Be no longer than 4 A4 pages, including references
  • Outline your reasons for applying for this studentship
  • Explain how you would approach the research, including details of your
  • skills and experience in the topic area

If you have any queries, please contact phd-compsci@sheffield.ac.uk

Funding:

This position is fully funded by LivePerson, covering all tuition fees and a stipend at
the standard UKRI rate.

Read more

PhD stipend in Self-Supervised Learning for Decoding of Complex Signals

This PhD stipend is funded by the Pioneer Centre for Artificial Intelligence’s Collaboratory, Signals and Decoding. The Pioneer Centre for AI is located at the University of Copenhagen, with partners at Aarhus University, Aalborg University, The Technical University of Denmark, and the IT University of Copenhagen. There will be a cohort of PhD students starting during the fall of 2023 across the partner universities. PhD students at the Pioneer Centre for AI will have extraordinary access computing resources, to international researchers across many disciplines within computer sciences and other academic areas, as well as courses and events at the centre, and meaningful collaboration with industry, the public sector, and the start-up ecosystem.

Centre website: www.aicentre.dk

To date, most successful applications of deep learning in signals and decoding are based on supervised learning. However, supervised learning is contingent on the availability of labelled data, i.e., each sample has a semantic annotation. The need for labelled data is a serious limitation to applications at scale and complicates the maintenance of real-life supervised learning systems.

The typical situation is that unlabelled data is abundant, and this has given rise to paradigms such as semi-supervised and self-supervised learning (SSL). Both directions in SSL are based on combining large amounts of unlabelled data with limited labelled data. While semi-supervised learning invokes generative models to learn representations that support learning with few labels, self-supervised learning is based on supervised learning with a supervisory signal derived from the data itself.

The goal of this PhD study is to develop novel semi-supervised and self-supervised methods for modeling signals of various modalities (e.g., speech, audio, vision, text) and analyse the complexity of the developed models. The PhD student during the study is further provided with opportunities to do research at other units and the headquarter of the Pioneer Centre as well as abroad.

The PhD candidate is expected to have:

  • A Master's degree (120 ECTS points) or a similar in Computer Science, Electronic Engineering, Computer Engineering, Applied Mathematics or equivalent.
  • Knowledge with machine learning and deep learning.
  • Hands-on experience with Python and deep learning frameworks.
  • Experience with signal processing as a plus.
  • Strong analytical and experimental skills.
  • High-level of motivation and innovation.
  • High-level of written and spoken English.

You may obtain further information from Professor Zheng-Hua Tan, Department of Electronic Systems, phone: +45 99 40 86 86, email: zt@es.aau.dk, concerning the scientific aspects of the stipend.

DEADLINE

02/04/2023

Apply online

Read more

ML NLP Post doc

Tufts University has an opening for a post-doctoral researcher to engage in a cross-cutting project focusing on the development of and use of Natural Language Processing (NLP) for social sciences applications. Recent advances, starting with the now classical word2vec approach to models built using attention and transformer networks such as BERT and GTP3 have shown tremendous potential for natural language modeling and automated interpretation. While most applications currently focus on content generation, user interfaces (chatbots), and understanding news content, this project aims to use NLP as a major assistive technology to gain insights into student work and understanding in STEM education systems. This presents novel challenges, to interpret, for example, whether a student is arguing from intuition or from formal principles, whether they are excited or intimidated, whether they are uncertain or confident. The data may be written work or audio-video streams of  conversations, and analysis of the latter may involve video processing of  gestures and tone of voice. This multi-modal analysis is the next frontier in NLP and will require novel advances in both statistical machine learning and deep learning architectures. This position provides a unique opportunity to develop and collaborate with exclusive data sets, in research designed to influence classroom teaching and impact.

Applicants must have a PhD in electrical engineering, computer science, applied mathematics, statistics, or a similar field; a background of research in the learning sciences would be helpful but is not required The ideal candidates will have experience with and a publication record in one or more of the following areas: modern methods of statistical signal processing, machine learning, optimization, or data science with applications to NLP. Programming experience in Matlab or python is highly desired and preferred. 

The post-doc will be jointly supervised by a team of faculty in machine learning (Prof. Eric Miller, Prof. Shuchin Aeron) and by a team of faculty in the learning sciences (Prof. Julia Gouvea, Prof. David Hammer).

For more information about this position, please email Prof. Shuchin Aeron (shuchin.aeron@tufts.edu) Prof. Eric Miller (eric.miller@tufts.edu) and Prof. Bree Aldridge  (bree.aldridge@tufts.edu). Interested candidates should provide Prof. Miller with a copy of their CV, list of references, cover letter, and copies of relevant articles, theses, technical reports etc.

Read more

NIH Program Officer

Health Scientist Program Officer

The National Institute on Deafness and Other Communication Disorders (NIDCD) is recruiting a Program Director/Health Science Administrator (HSA) (GS-12/13/14) with expertise and research experience in data science and cloud computing efforts leveraging "big data" for biomedical research. Salary is commensurate with individual qualifications and professional experience. A full benefits package is available, including retirement, health insurance, life insurance, long-term care insurance, annual and sick leave, and Thrift Savings Plan (401K equivalent). We anticipate that the vacancy announcement for an HSA Program Officer will be posted later this summer at   http://jobs.nih.gov/globalrecruitment.

The successful candidate will:

Advise NIDCD-supported investigators across all research portfolios on implementing best practices from biomedical data science for data collection, storage, analysis, use, and sharing to ensure widespread access and accelerate the discovery of insights that will improve the lives of people with communication disorders. Promoting the use of existing data repositories in the cloud whenever possible.

Manage a research portfolio of data science grant awards conducted across the United States and internationally as well as identify scientific gaps and opportunities in the NIDCD's mission areas. This will require organizing workshops to engage stakeholders, promotion of cloud-based data sharing practices, and identifying future research opportunities.

Assist NIDCD staff in maintaining compliance with NIH data management and sharing policies. This will include participating in data science collaborations across NIH, outreach efforts to academic institutions, developing common data elements (CDEs), serving as a data science spokesperson, supporting internal activities to advance data science capabilities, and reviewing data sharing requirements.

Preferred Skills and Qualifications

  • Expertise in cloud computing platforms, data repositories, machine learning, and other activities requiring significant data science knowledge.
  • A doctoral degree in engineering or bioinformatics and experience with cloud-based computing for biomedical research.
  • Expertise with the NIDCD's research areas is not required, and individuals at early- to mid-career stages are strongly encouraged to apply.

The NIDCD is deeply committed to diversity of thought, equity, and inclusion, and encourages applications from qualified women, under-represented minorities, and individuals with disabilities. HHS, NIH, and NIDCD are equal opportunity employers.

Please contact Roger L. Miller, Ph.D., program director of neural prosthesis development and program coordinator of SBIR/STTG grant programs, with questions or interest and check this website for updates: 

Health Scientist Administrator (HSA Program Officer, Data Science) | NIDCD (nih.gov)

Read more

Research Engineer (Research Fellow) in Sound Sensing

Research Engineer (Research Fellow) in Sound Sensing

       Location: University of Surrey, Guildford, UK

       Closing Date: Monday 08 August 2022 (23:59 BST)

       Further details: https://jobs.surrey.ac.uk/025022-R

Applications are invited for a Research Engineer (Research Fellow) in Sound Sensing, to work full-time for six months on an EPSRC-funded Fellowship project "AI for Sound" (https://ai4s.surrey.ac.uk/), to start September 2022 or as soon as possible thereafter.

The aim of the project is to undertake research in computational analysis of everyday sounds, in the context of a set of real-world use cases in assisted living in the home, smart buildings, smart cities, and the creative sector. 

The postholder will be responsible for designing and building the hardware and software to be developed in the fellowship, including sound sensor systems, open-source software libraries and datasets to be released from the project.

The postholder will be based in the Centre for Vision, Speech and Signal Processing (CVSSP) and work under the direction of PI (EPSRC Fellow) Prof Mark Plumbley.

The successful applicant is expected to have a postgraduate qualification in electronic engineering, computer science or a related subject, or equivalent professional experience; experience in software and hardware development relevant to signal processing or sensor devices, and experience in software development in topics such as audio signal processing, machine learning, deep learning, and/or sensor systems.  Experience in development and deployment of hardware sensors, Internet-of-Things (IoT) devices, or audio systems; and programming experience using Python, C++, MATLAB, or other tools for signal processing, machine learning or deep learning is desirable. Direct research experience, or experience of hardware or software development while working closely with researchers, is also desirable.

CVSSP is an International Centre of Excellence for research in Audio-Visual Machine Perception, with 180 researchers, a grant portfolio of £26M (£17.5M EPSRC), and a turnover of £7M/annum. The Centre has state-of-the-art acoustic capture and analysis facilities and a Visual Media Lab with video and audio capture facilities supporting research in real-time video and audio processing and visualisation. CVSSP has a compute facility with 120 GPUs for deep learning and >1PB of high-speed secure storage.

The University is located in Guildford, a picturesque market town with excellent schools and amenities, and set in the beautiful Surrey Hills, an area of Outstanding Natural Beauty.  London is just 35 minutes away by train, while both Heathrow and Gatwick airports are readily accessible.

For more information about the post and how to apply, please visit:

       https://jobs.surrey.ac.uk/025022-R

Deadline: Monday 08 August 2022 (23:59 BST)

For informal inquiries about the position, please contact Prof Mark Plumbley (m.plumbley@surrey.ac.uk).

Read more

Postdoctoral Research Position

Postdoctoral research position:

Localisation of  the Mozilla Common Voice platform for South African languages

Stellenbosch University, South Africa

A postdoc position focussing on the localisation of the Mozilla Common Voice Platform[1] in South Africa is available in the Digital Signal Processing Group of the Department of Electrical and Electronic Engineering at Stellenbosch University, South Africa.  The project will involve the translation of the Common Voice interface into ten target languages as well as sourcing a minimum of 5 000 public domain (CC-0) sentences per language. The successful candidate will liaise with relevant members of the Common Voice Community and Mozilla Foundation, be responsible for coordinating the translation process and for designing and implementing quality assurance measures to verify the quality of the translations. The project will also entail identifying possible sources of appropriate public domain text as well as obtaining, cleaning and checking the data.  Specific project objectives include setting up and managing the translation process, validating translations, gathering text data and developing a “Common Voice sentence preparation” protocol. In addition, the project will provide an opportunity to conduct new and original research in related areas such as machine translation and/or language modelling and topic modelling in South Africa’s official languages.

Applicants must hold a PhD (preferably obtained within the last 5 years) in the field of Computational Linguistics, Information Engineering, Computer Science, Electronic/Electrical Engineering, or other relevant disciplines. Suitable candidates must have practical and research experience with text processing and analysis, topic modelling and/or machine translation and should have an excellent background in statistical modelling. Applicants should also have proven prior experience in text corpus compilation,  have good programming skills and be able to use high level programming languages for developing prototype systems. Finally, candidates must have excellent English writing skills and have an explicit interest in scientific research and publication. The position will be available for one year, with a possible extension to a second year, depending on progress and available funds. 

Applications should include:

  • A covering letter explaining why the applicant is interested in the position and what his or her most relevant qualifications are.
  • A curriculum vitae that includes a list of publications and describes research projects and conference participation.
  • The details of three contactable referees.

Applications should be sent as soon as possible by email to Prof Febe de Wet (fdw@sun.ac.za). The successful applicant will be subject to University policies and procedures. Please note that postdoctoral fellows are not appointed as employees, and their fellowships are awarded tax-free. They are therefore not eligible for employee benefits.

Interested applicants are welcome to contact me at the above e-mail address for further information regarding the project. https://commonvoice.mozilla.org/en

(Date posted: 16 March 2022)

[1] https://commonvoice.mozilla.org/en

Read more

Post-Doctoral Researcher

Post-doctoral research position:

Extremely-low-resource radio browsing for humanitarian monitoring

Stellenbosch University, South Africa

A post-doctoral research position focussing on the automatic identification of spoken keywords in multilingual environments with extremely few or even no resources using state-of-the-art architectures is available in the Digital Signal Processing Group of the Department of Electrical and Electronic Engineering at the University of Stellenbosch. This is part of an ongoing project to develop wordspotters that can be used to monitor community radio broadcasts in rural African regions as a source of early warning information during natural disasters, disease outbreaks, or other crises. This phase of the project will consider languages spoken in Mali, at least some of which are severely under-resourced and have not been the subject of speech technology research before. Specific project objectives include the development of research systems, the development of deployable systems, the development of new methods and techniques and the production of associated publishable outputs.

The position is part of a collaborative project with the United Nations Global Pulse. References to papers already produced as part of the project are listed below, and some general further information is available at http://pulselabkampala.ug/.

Applicants must hold a PhD (obtained within the last 5 years) in the field of Electronic/Electrical Engineering, Information Engineering, or Computer Science, or other relevant disciplines. Suitable candidates must have practical experience with automatic speech recognition systems in general and deep neural net architectures in particular, and should have an excellent background in statistical modelling and machine learning. The candidate must also have good programming skills and be able to use high level programming languages for developing prototype systems. Finally, candidates must have excellent English writing skills and have an explicit interest in scientific research and publication.

The position will be available for one year, with a possible extension to a second year, depending on progress and available funds.

Applications should include:

  • A covering letter explaining why the applicant is interested in the position and what his or her most relevant qualifications are.

  • A curriculum vitae that includes a list of publications and describes research projects and conference participation.

  • The details of three contactable referees.

Applications should be sent as soon as possible by email to Prof Thomas Niesler (trn@sun.ac.za). The successful applicant will be subject to University policies and procedures.

Interested applicants are welcome to contact me at the above e-mail address for further information regarding the project.

References:

  1. van der Westhuizen, E; Kamper, H; Menon, R; Quinn, J; Niesler, T.R. Feature learning for efficient ASR-free keyword spotting in low-resource languages. Computer Speech and Language, vol 71, pp. 101275, doi:10.1016/j.csl.2021.101275, 2022.

  2. van der Westhuizen, E; Padhi, T; Niesler, T.R. Multilingual training set selection for ASR in under-resourced Malian languages. Proceedings of the 23rd International Conference on Speech and Computer (SPECOM), St Petersburg, Russia, 2021. [Virtual conference – COVID19].

  3. Menon, R; Kamper, H; van der Westhuizen, E; Quinn, J; Niesler, T.R. Feature exploration for almost zero-resource ASR-free keyword spotting using a multilingual bottleneck extractor and correspondence autoencoders. Proceedings of Intespeech, Graz, Austria, September 2019.

  4. Biswas, A; Menon, R; van der Westhuizen, E; Niesler, T.R. Improved low-resource Somali speech recognition by semi-supervised acoustic and language model training. Proceedings of Intespeech, Graz, Austria, September 2019.

  5. Menon, R; Biswas, A; Saeb, A; Quinn, J; Niesler, T.R. Automatic Speech Recognition for Humanitarian Applications in Somali. Proceedings of SLTU, Gurugram, India, August 2018.

  6. Menon, R; Kamper, H; Yilmaz, E; Quinn, J; Niesler, T.R. ASR-free CNN-DTW keyword spotting using multilingual bottleneck features for almost zero-resource languages.. Proceedings of SLTU, Gurugram, India, August 2018.

  7. Menon, R; Kamper, H; Quinn, J; Niesler, T.R. Fast ASR-free and almost zero-resource keyword spotting using DTW and CNNs for humanitarian monitoring. Proceedings of Interspeech, Hyderabad, India, September 2018.

  8. Saeb, A; Menon, R; Cameron, H; Kibira, W; Quinn, J; Niesler, T.R. Very low resource radio browsing for agile developmental and humanitarian monitoring. Proceedings of Interspeech, Stockholm, Sweden, August 2017.

  9. Menon, R; Saeb, A; Cameron, H; Kibira, W; Quinn, J; Niesler, T.R. Radio-browsing for Developmental Monitoring in Uganda. Proceedings of the 42nd IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, USA, March 2017.

Read more

Postdoctoral Position

Title: Glottal source inverse filtering for the analysis and classification of pathological speech

Keywords: Pathological speech processing, Glottal source estimation, Inverse filtering, Machine learning, Parkinsonian disorders, Respiratory diseases

Contact and Supervisor: Khalid Daoudi (khalid.daoudi@inria.fr)

INRIA team: GEOSTAT (geostat.bordeaux.inria.fr)

Location: Bordeaux, France

Duration: 13 months (could be extended)

Starting date: between 01/04/2022 and 01/06/2022 (depending on the candidate availability)

Application: via https://recrutement.inria.fr/public/classic/en/offres/2022-04481

Salary: 2653€/month (before taxes, net salary 2132€)

Profile: PhD thesis in signal/speech processing (or a solid post-thesis experience in the field)

Required Knowledge and background: A solid knowledge in speech/signal processing; Basics of machine learning; Programming in Matlab and Python.

Scientific research context:

During this century, there has been an ever increasing interest in the development of objective vocal biomarkers to assist in diagnosis and monitoring of neurodegenerative diseases and, recently, respiratory diseases because of the Covid-19 pandemic. The literature is now relatively rich in methods for objective analysis of dysarthria, a class of motor speech disorders [1], where most of the effort has been made on speech impaired by Parkinson’s disease. However, relatively few studies have addressed the challenging problem of discrimination between subgroups of Parkinsonian disorders which share similar clinical symptoms, particularly is early disease stages [2]. As for the analysis of speech impaired by respiratory diseases, the field is relatively new (with existing developments in very specialized areas) but is taking a great attention since the beginning of the pandemic. The speech production mechanism is essentially governed by five subsystems: respiratory, phonatory, articulatory, nasalic and prosodic. In the framework of pathological speech, the phonatory subsystem is the most studied one, usually using sustained phonation (prolonged vowels). Phonatory measurements are generally based on perturbations or/and cepstral features. Though these features are widely used and accepted, they are limited by the fact that the produced speech can be a product of some or all the other subsystems. The latter thus all contribute to the phonatory performance. An appealing way to bi-pass this problem is to try to extract the glottal source from speech in order to isolate the phonatory contribution. This framework is known as glottal source inverse filtering (GSIF) [3]. The primary objective of this proposal is to investigate GSIF methods in pathological speech impaired by dysarthria and respiratory deficit. The second objective is to use the resulting glottal parameterizations as inputs to basic machine learning algorithms in order to assist in the discrimination between subgroups of Parkinsonian disorders (Parkinson’s disease, Multiple-System Atrophy, Progressive Supranuclear Palsy) and in the monitoring of respiratory diseases (Covid-19, Asthma, COPD). Both objectives benefit from a rich dataset of speech and other biosignals recently collected in the framework of two clinical studies in partnership with university hospitals in Bordeaux and Toulouse (for Parkinsonian disorders) and in Paris (for respiratory diseases).

Work description:

GSIF consists in building a model to filter out the effect of the vocal tract and lips radiation from the recorded speech signal. This difficult problem, even in the case of healthy speech, becomes more challenging in the case of pathological speech. We will first investigate time-domain methods for the parameterization of the glottal excitation using glottal opening and closure instants. This implies the development of a robust technique to estimate these critical time-instants from dysarthric speech. We will then explore the alternative approach of learning a parametric model of the entire glottal flow. Finally, we will investigate frequency-domain methods to determine relationships between different spectral measures and the glottal source. These algorithmic developments will be evaluated and validated using a rich set of biosignals obtained from patients with Parkinsonian disorders and from healthy controls. The biosignals are electroglottography and aerodynamic measurements of oral and nasal airflow as well as intra-oral and sub-glottic pressure. After dysarthric speech GIFS analysis, we will study the adaptation/generalization to speech impaired by respiratory deficits. The developments will be evaluated using manual annotations, by an expert phonetician, of speech signals obtained from patients with respiratory deficit and from healthy controls. The second aspect of the work consists in manipulating machine learning algorithms (LDA, logistic regression, decision trees, SVM…) using standard tools (such as Scikit-Learn). The goal here will be to study the discriminative power of the resulting speech features/measures and their complementarity with other features related to different speech subsystems. The ultimate goal being to conceive robust algorithms to assist, first, in the discrimination between Parkinsonian disorders and, second, in the monitoring of respiratory deficit.

Work synergy:

- The postdoc will interact closely with an engineer who is developing an open-source software architecture dedicated to pathological speech processing. The validated algorithms will be implemented in this architecture by the engineer, under the co-supervision of the postdoc.

- Giving the multidisciplinary nature of the proposal, the postdoc will interact with the clinicians participating in the two clinical studies.

References:

[1] J. Duffy. Motor Speech Disorders Substrates, Differential Diagnosis, and Management. Elsevier, 2013.

[2] J. Rusz et al. Speech disorders reflect differing pathophysiology in Parkinson's disease, progressive supranuclear palsy and multiple system atrophy. Journal of Neurology, 262(4), 2015.

[3] P. Alku. Glottal inverse filtering analysis of human voice production – A review of estimation and parameterization methods of the glottal excitation and their applications. Sadhana – Academy Proceedings in Engineering Sciences. Vol. 36, Part 5, pp. 623-650, 2011.

Read more

Assistant or Associate Professor in Speech and Language Technology (tenure track), Aalto University

ASSISTANT OR ASSOCIATE PROFESSOR IN SPEECH AND LANGUAGE TECHNOLOGY (tenure track)

Aalto University is a community of bold thinkers where science and art meet technology and business. We are committed to identifying and solving grand societal challenges and building an innovative future. Aalto has six schools with nearly 11 000 students and a staff of more than 4000, of which 400 are professors. Our main campus is located in the Helsinki Metropolitan area, in Espoo, Finland. Diversity is part of who we are, and we actively work to ensure our community’s diversity and inclusiveness. This is why we warmly encourage qualified candidates from all backgrounds to join our community.

The School of Electrical Engineering promotes high-quality science, technology and innovations for the benefit of the Finnish society and all of humankind. In our research environment, the natural sciences, engineering and information technology intertwine to form smart systems and innovations that save energy and promote wellbeing. With our research, we seek to respond to many challenges posed by sustainable development and the results are applied, for example, in mobile devices, electrical networks, health care and in satellites. School’s special strength is linking the research with the Finnish and international business sector. We have around 2000 students in total, and around 250 master’s and 50 doctoral degrees annually. Our personnel consist of 700 people with over 70 professors.

The Department of Signal Processing and Acoustics at the Aalto University School of Electrical 

Engineering invites applications for the position:

ASSISTANT OR ASSOCIATE PROFESSOR IN SPEECH AND LANGUAGE TECHNOLOGY (tenure track) 

The professorship is open for highly qualified applicants who have demonstrated outstanding machine learning (ML) research in speech and language technology. Experts representing all areas of speech and language technology are eligible to apply, but the candidates must have demonstrated strong expertise in ML. We prioritize fields with a high potential for research collaboration with the Department’s current groups in speech and language technology. We expect a strong track record of publications and achievements in ML in speech and language technology, excellent teaching skills, motivation and competence to start and lead new and highly ambitious and multidisciplinary research projects aiming at significant scientific results and impacts. All applicants must have a doctorate in speech and language technology (or in a related area of engineering) and fluent command in English. The teaching responsibilities of the professorship include lectures in ML and in speech and language technology and supervision of theses at the bachelor, master and postgraduate levels.

Your experience and ambition

We are looking for applicants with

  • A proven track record and passion to carry out high-quality ML research in speech and language technology and publish in top venues of the study area 
  • A doctorate in speech and language technology (or in a related area of engineering)
  • Potential to acquire research funding and build up your own research group 
  • Motivation to teach both bachelor and master level engineering students

We offer

  • A tenure track position with promotion to a tenured position based on merits
  • A competitive benefits package including access to health care
  • Start-up funding and grant writing support to help you establish your own research group
  • Excellent collaboration possibilities within the university

Great future in one of the happiest, cleanest and safest countries in the world, with a comprehensive social security system and free education up to university level.

Aalto tenure track and contract terms

This professorship is a tenure track position and will be filled to the assistant or associate professor level with a fixed- term contract until the tenure review. The salary is based on Aalto University salary system, but you can also provide your own salary request. The University provides a research start-up fund, and we actively assist researchers to apply for available scientific research funding. Getting tenure and advancement on Aalto tenure track is based on an evaluation of your achievements and merits against the Aalto tenure track criteria. Please see the details about the tenure track path at Aalto at https://www.aalto.fi/en/tenure-track/tenure-track-career-path and evaluation criteria at https://www.aalto.fi/services/tenure-track-evaluation-criteria.

Scientific environment

You will join the Department of Signal Processing and Acoustics in the Aalto University School of Electrical Engineering. Speech and language technology is one of the Department’s four research focus areas and the area has currently three professors. Professor Paavo Alku leads a group in speech communication technology focusing on modelling of speech production and on speech-based biomarking of health. Professor Mikko Kurimo leads a group in speech recognition and language modelling focusing on machine learning models, representations and applications for conversational speech. Associate Professor Tom Bäckström leads a group in speech interaction technology focusing on speech transmission, extraction and privacy. These groups are also very well connected to the recently founded Finnish Centre of Artificial Intelligence (FCAI), which is a large collaboration effort for professors in machine learning and speech and language technology in both Aalto University and University of Helsinki.

Ready to apply?

Please submit your application latest on 3 April 2022 through our recruiting system:

Open positions - Workday (myworkdayjobs.com)

choose “ Apply”. Please include the following pdf documents in English 1) cover letter and curriculum vitae (with contact information and the Orchid and/or Research ID number), 2) the list of publications (in which the five most significant publications are highlighted and your role in them described) 3) a research statement describing past research and plans for the future research, 4) a teaching portfolio describing teaching experience and plans for teaching, 5) the list of references/possible reference letters. General instructions for applicants including language requirements and guidelines for compiling the teaching portfolio and CV are at https://www.aalto.fi/tenure-track/interested-in-joining-our-tenure-track.

More information

If you wish to hear more about the position or us, please contact Professor Paavo Alku paavo.alku(at)aalto.fi +358405009867 or Professor Mikko Kurimo mikko.kurimo(at)aalto.fi +358503476221. In case you have questions related to the recruitment process, please contact HR Coordinator Alina Järvinen alina.jarvinen(at)aalto.fi.

About Aalto University, Helsinki and Finnish society

At Aalto, high-quality research, art, education and entrepreneurship are promoted hand in hand. Disciplinary excellence is combined with multidisciplinary activities, engaging both students and the local innovation ecosystem. Our main campus is quickly transforming into an open collaboration hub that encourages encounters between students, researchers, industry, startups and other partners. Aalto University was founded in 2010 as three leading Finnish universities, Helsinki University of Technology, the Helsinki School of Economics and the University of Art and Design Helsinki, were merged to strengthen Finland’s innovative capability. The greater Helsinki region is a world-class information technology complex, attracting leading scientists and researchers in various fields of electrical engineering. As a living and working environment, Finland consistently ranks high in quality of life, and Helsinki, the capital of Finland, is regularly ranked as one of the most livable cities in the world.

Finns are proud to say that we have one of the best education systems in the world. The Nordic values of equality and co-operation are deeply rooted in our society. We are one of the world’s top countries in happiness, clean air and nature, press freedom and consider the many voices in our society a strength. With high investments in R&D, a strong innovation culture, open data and advanced state of digitalization, we are a nation of innovation and entrepreneurship. Gender equality, flexibility and the low hierarchy are at the core of our Nordic working environment. Having four seasons, clean air and thousands of lakes, we are nature-loving people who take good care of our unique environment. For more information about living in Finland see Aalto´s pages for international staff: https://www.aalto.fi/en/careers-at-aalto/for-international-staff.

Read more