Skip to main content

Machine Learning for Signal Processing

MLSP

PhD Position

************ PhD position at Inria (Nancy - Grand Est), France **************

(More information: https://jobs.inria.fr/public/classic/en/offres/2021-03399)

Title: Robust and Generalizable Deep Learning-based Audio-visual Speech Enhancement

The PhD thesis will be jointly supervised by Mostafa Sadeghi (Inria Starting Faculty Position) and Romain Serizel (Associate Professor, Université de Lorraine) in the MULTISPEECH Team at Inria, Nancy - Grand Est, France.

Contacts: Mostafa Sadeghi (mostafa.sadeghi@inria.fr) and Romain Serizel (romain.serizel@loria.fr)

Context: Audio-visual speech enhancement (AVSE) refers to the task of improving the intelligibility and quality of a noisy speech utilizing the complementary information of visual modality (lips movements of the speaker) [1]. Visual modality can help distinguish target speech from background sounds especially in highly noisy environments. Recently, and due to the great success and progress of deep neural network (DNN) architectures, AVSE has been extensively revisited. Existing DNN-based AVSE methods are categorized into supervised and unsupervised approaches. In the former category, a DNN is trained to map noisy speech and the associated video frames of the speaker into a clean estimate of the target speech. The unsupervised methods [2] follow a traditional maximum likelihood-based approach combined with the expressive power of DNNs. Specifically, the prior distribution of clean speech is learned using deep generative models such as variational autoencoders (VAEs) and combined with a likelihood function based on, e.g., non-negative matrix factorization (NMF), to estimate the clean speech in a probabilistic way. As there is no training on noisy speech, this approach is unsupervised.

Supervised methods require deep networks, with millions of parameters, as well as a large audio-visual dataset with diverse enough noise instances to be robust against acoustic noise. There is also no systematic way to achieve robustness to visual noise, e.g., head movements, face occlusions, changing illumination conditions, etc. Unsupervised methods, on the other hand, show a better generalization performance and can achieve robustness to visual noise thanks to their probabilistic nature [3]. Nevertheless, their test phase involves a computationally demanding iterative process, hindering their practical use.

Objectives: Project description: In this PhD project, we are going to bridge the gap between supervised and unsupervised approaches, benefiting from both worlds. The central task of this project is to design and implement a unified AVSE framework having the following features: 1- Robustness to visual noise, 2- Good generalization to unseen noise environments, and 3- Computational efficiency at test time. To achieve the first objective, various techniques will be investigated, including probabilistic switching (gating) mechanisms [3], face frontalization [4], and data augmentation [5]. The main idea is to adaptively lower bound the performance by that of audio-only speech enhancement when the visual modality is not reliable. To accomplish the second objective, we will explore techniques such as acoustic scene classification combined with noise modeling inspired by unsupervised AVSE, in order to adaptively switch to different noise models during speech enhancement. Finally, concerning the third objective, lightweight inference methods, as well as efficient generative models, will be developed. We will work with the AVSpeech [6] and TCD-TIMIT [7] audio-visual speech corpora.

References:

[1] D. Michelsanti, Z. H. Tan, S. X. Zhang, Y. Xu, M. Yu, D. Yu, and J. Jensen, “An overview of deep-learning based audio-visual speech enhancement and separation,” arXiv: 2008.09586, 2020.

[2] M. Sadeghi, S. Leglaive, X. Alameda-Pineda, L. Girin, and R. Horaud, “Audio-visual speech enhancement using conditional variational auto-encoders,” IEEE/ACM Transactions on Audio, Speech and Language Processing, vol. 28, pp. 1788 –1800, 2020.

[3] M. Sadeghi and X. Alameda-Pineda, “Switching variational autoencoders for noise-agnostic audio-visual speech enhancement,” in ICASSP, 2021.

[4] Z. Kang, M. Sadeghi, R. Horaud, “Face Frontalization Based on Robustly Fitting a Deformable Shape Model to 3D  Landmarks,” arXiv: 2010.13676, 2020.

[5] S. Cheng, P. Ma, G. Tzimiropoulos, S. Petridis, A. Bulat, J. Shen, M. Pantic, “Towards Pose-invariant Lip Reading,”  in ICASSP, 2020.

[6] A. Ephrat, I. Mosseri, O. Lang, T. Dekel, K. Wilson, A. Hassidim, W.T. Freeman, M. Rubinstein, “Looking to Listen  at the Cocktail Party: A Speaker-Independent Audio-Visual Model for Speech Separation,” SIGGRAPH 2018.

[7] N. Harte and E. Gillen, “TCD-TIMIT: An Audio-Visual Corpus of Continuous Speech,” IEEE Transactions on Multimedia, vol.17, no.5, pp.603-615, May 2015.

Skills:

  • Master's degree, or equivalent, in the field of speech/audio processing, computer vision, machine learning, or in a related field,
  • Ability to work independently as well as in a team,
  • Solid programming skills (Python, PyTorch),
  • A decent level of written and spoken English.

Benefits package:

  • Subsidized meals
  • Partial reimbursement of public transport costs
  • Leave: 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.)
  • Possibility of teleworking (after 6 months of employment) and flexible organization of working hours
  • Professional equipment available (videoconferencing, loan of computer equipment, etc.)
  • Social, cultural, and sports events and activities
  • Access to vocational training
  • Social security coverage

Remuneration:

Salary: 1982€ gross/month for 1st and 2nd year. 2085€ gross/month for 3rd year.

Monthly salary after taxes: around 1596,05€ for 1st and 2nd year. 1678,99€ for 3rd year. (medical insurance included).

Apply

Read more

Post-doc in Biomedical ImageAanalysis

Job offer: Post-doc in biomedical image analysis

We are seeking candidates who are interested in developing deep learning algorithms for improving the registration/alignment of our 2D/3D MRI and microscopy images. 

Job description

The selected candidate will join the interdisciplinary Brain/MINDS project which aims at studying the neural networks controlling higher brain functions in the marmoset, to gain new insights into information processing and diseases of the human brain.

As a member of the project, the selected candidate will contribute to the development and implementation of image processing and image analysis techniques with a focus on brain image data, specifically marmosets. We are seeking candidates who enjoy developing deep learning algorithms for 2D/3D image registration / alignment.

The work will be done in a highly interdisciplinary research group consisting of scientists from the neural-scientific and medical research fields. The emphasis is on developing cutting-edge technologies that improve current state-of-the-art and publishing high impact work in top-tier journals in order to build a substantial resume and strong international collaborations.

Experiences in biomedical image analysis is an advantage but not a requirement. This job may be a great opportunity to apply knowledge and expertise from the computer vision and/or image processing field to new problems in the biomedical field.

Location

Wako-City (Kanto district, 2-1 Hirosawa, Wako, Saitama 351-0198). RIKEN is located in very close proximity to the northern part of Tokyo. Map: http://www.riken.jp/en/access/wako-map/.

The RIKEN campus is quite large and offers cafeterias, coffee shops, and a convenient store. From the nearest train station, it is only a 12 min train ride to the Ikebukuro-Station (Tokyo). The Ikebukuro-Station is a hub which connects many famous places in Tokyo, including Shinjuku (9 min train ride), Shibuya (18 min train ride) or Akihabara (19 min train ride). Many people prefer to avoid crowded streets and trains in their daily life and are living in close proximity to RIKEN. However, those who prefer living close to the sightseeing, nightlife and entertainment spots in Tokyo benefit from commuting out of the city in the morning, and returning in the evening (significantly less crowded than the other way around).

Qualifications

The candidate should have or be expecting to receive a Ph.D., by the time of employment, in related fields and have

  • relevant research skills and experiences in developing deep learning techniques for the analysis/processing of images, demonstrated by high-quality publications.

  • expertise in biological/medical/neural image processing, image registration, computer vision, machine learning, optimization, or similar fields, is an advantage.

  • good English communication skills

  • proficiency in a programming language (such as C++/Python/JS)

  • proficiency in tensorflow, pytorch or a similar DL library

  • good communication skills and ability to cooperate

Application & Employment

RIKEN employees enjoy the benefits of a generous vacation and leave package. An overview of benefits can be found here: working-at-riken. For application details, please refer to the official job posting URL: https://cbs.riken.jp/en/careers/20210226_w20294_h.skibbe_r.html.

Read more

Postdoctoral Position in Machine Learning-based Image Analysis (MGH and Harvard Medical School)

Closing date: open until the positions are filled.

The Gordon Center for Medical Imaging (GCMI) in the Department of Radiology at Massachusetts General Hospital (MGH) and Harvard Medical School (HMS) in Boston, MA has an immediate opening at the postdoctoral level to work on research projects related to PET and MR image analysis, image restoration and image reconstruction.

The projects are funded by NIH, aiming to utilize machine learning-based image analysis to improve the diagnosis and progression tracking for Alzheimer’s disease (AD) and Neuroendocrine tumor (NET). The projects are based on cooperation with Harvard Aging Brain Study (HABS) and MD Anderson Cancer Center (MDACC).

Qualifications:

Applicants should have earned a Ph.D. in engineering, statistics/mathematics, physics, neuroscience, or a related field. Strong analytical, quantitative and programming skills as well as proficiency in machine learning are essential. Prior experience with PET/MR image analysis is not required. The successful candidate will have joint appointments at MGH and HMS.

Environment:

The Department of Radiology offers extensive core research facilities, including a new digital time-of-flight PET/CT scanner, brain and whole body PET/MRI scanners, and small animal PET/SPECT/CT systems. Studies on these imaging systems are supported by the MGH Gordon PET Core’s fully equipped blood/radiometabolite processing laboratory, on-site cyclotron, dedicated research radiochemistry laboratories, and a world class GMP radiopharmacy. The Gordon Center maintains a large-scale computing facility for image analysis, network training, tomographic reconstruction, Monte Carlo simulation, and other computationally intensive research applications. The successful applicant can interact collaboratively with a large (140+), growing research group in diverse areas of imaging technology and applications.

The position is available as of February 2021 and the start date is flexible. If interested, please send your curriculum vitae, a cover letter describing your background and research interests, and contact information of three references to Dr. Kuang Gong (kgong AT mgh.harvard.edu).

Read more

Assistant Professor in Structured Data Science

Many applications generate large data sets from which information needs to be extracted. The emerging field of structured data science extends signal processing to data science.

Requirements

The opening for an Assistant Professor is intended to further develop this area. A background in statistical signal processing/modelling and the ability to apply this to data science/machine learning is required. Generally we search for candidates with a strong signal processing background complementary to the expertise that is already present in the CAS group at TU Delft. Experience with biomedical signal and image processing applications is of interest. The candidate will also be involved in teaching and e.g. develop new courses on structured data science and machine learning for Electrical Engineering students.

While this position is defined as a tenure-track Assistant Professor position, excellently qualified but more senior researchers are also invited to apply. For this position, preference will be given to female applicants.

Candidates should have (1) a PhD degree in Electrical Engineering or a closely related discipline, with outstanding academic credentials, (2) several years of working experience as a Postdoctoral Researcher in an academic institution, and (3) the ambition to be a future scientific leader in the mentioned area.

Applications

To apply, please e-mail a detailed CV including a list of publications, a cover letter describing your motivation and suitability for the position, a research and a teaching statement. Send your application material via the TU Delft recruitment page. When applying for this position, please refer to vacancy number TUD00629.

Further information

For further information, contact prof. Alle-Jan van der Veen, chair of the Circuits and Systems group at TU Delft.

Read more

Junior Professor in Reinforcement Learning

Open faculty position at KU Leuven, Belgium: junior professor in reinforcement learning 

KU Leuven's Faculty of Engineering Science has a fixed-term academic vacancy (5 years, part-time 95%) in the area of Reinforcement Learning. The successful candidate will teach in the Master of Artificial Intelligence program, conduct research with a focus on reinforcement learning, and supervise students in the Master and PhD programs. The candidate will be embedded in the DTAI section of the Department of Computer Science. More information is available at https://www.kuleuven.be/personeel/jobsite/jobs/55711607?hl=en&lang=en

KU Leuven is committed to creating a diverse environment. It explicitly encourages candidates from groups that are currently underrepresented at the university to submit their applications. 

Read more

Post-Doc Position in AI-based Face Recognition Explainability

Face recognition has become a key technology in our society, frequently used in multiple applications, while creating an impact in terms of privacy. As face recognition solutions based on artificial intelligence (AI) are becoming popular, it is critical to fully understand and explain how these technologies work in order to make them more effective and accepted by society. In this project, we focus on the analysis of the influencing factors relevant for the final decision of an AI-based face recognition system as an essential step to understand and improve the underlying processes involved. The scientific approach pursued in the project is designed in such a way that it will be applicable to other use cases such as object detection and pattern recognition tasks in a wider set of applications. Thanks to the interdisciplinary nature of the consortium, the outcomes of XAIface will affect many fields and can be summarized as follows: (i) develop clear legal guidelines on the use and design of AI-based face recognition following the privacy-by-design approach; (ii) disentangling demographic information (age, gender, ethnicity) from the overall face representation in order to understand the impact of such traits on face recognition but also to develop demographic-free face recognition; (iii) address fairness and non-discrimination issues by following the idea of de-biasing during the training; (iv) optimize the tradeoff between interpretability and performance; (v) create tools that will allow assessment and measurement of performance and explanation of decisions of AI-based face recognition systems; (vi) analyze image coding impact to better understand how future AI-based coding solutions may be different from a recognition explainability point of view.

This project includes several international teams and will last for 3 years. The working place will be at Instituto de Telecomunicações, Instituto Superior Técnico, Lisboa, Portugal.

Research grant: The research grant is associated to a yearly renewable contract (up to 3 years) that includes an experimental period of 6 months. The research grant consists on a tax-free stipend of 1616€ per month. The candidates must fulfill the following conditions:

  • PhD in computer science, electrical and computer engineering or other relevant area, awarded in the past three years.
  • Preference will be given to candidates knowledgeable in machine learning, computer vision, multimedia signal processing and face recognition.
  • Strong motivation to perform research, to participate in a rich and stimulating international project, and to advance state-of-the-art through the publication of results in peer reviewed international conferences and journals.
  • Fluent in English and with good skills in technical writing and presenting.
  • Good programming skills (Python, C/C++) are required.

The selected candidate will work in a team lead by Prof. Fernando Pereira and Prof. João Ascenso (see http://www.img.lx.it.pt/Staff.html for details). The candidates will join a team of staff and PhD students where intense research and development activities in the multimedia signal processing and machine learning fields are carried out. 

To apply, please submit your application by sending an email to Prof. Fernando Pereira and Prof. João Ascenso at fp@lx.it.pt and joao.ascenso@lx.it.pt with the following documents:

  1. Detailed curriculum vitae with transcripts
  2. Motivation letter (research statement) explaining your interest in the position
  3. Recommendation letter(s)

Applications shall be received until suitable candidates are found but before 15/2/2021. Selected candidates will be interviewed. For any clarifications, please contact Prof. Fernando Pereira and Prof. João Ascenso.

Read more

Signal Processing Engineer

Who we are looking for:
An experienced signal processing engineer who is creative, innovative, thrives on technical challenges, and is comfortable merging concepts from different technical disciplines.

Experience (required):
• 5 years of signal processing experience (analysis, modification, and synthesis)
• strong background emphasizing and detecting components in signals
• strong data analysis/data science abilities
• strong programming abilities

Experience (desired):
• research and development (R & D)
• ASIC (application specific integrated circuit)design
• sensor design or integration

Education:
BS Electrical Engineering or related field

Location:
This job is a laboratory research position, so it is in-person, on-site in Tucson, AZ at Tech Parks Arizona.

What you will get:
Opportunity to be part of the founding team, launching new instrumentation that will change how the world is seen
Chance to be part of a close-knit, knowledgeable, and collaborative team
Additional experience in Research and Development (R&D)
Competitive compensation package that includes: competitive salary, health benefits, retirement, paid time off (PTO), stock options, and relocation assistance (for out-of-state applicants)

Read more

Phd Position - Signal Processing for Hearing Assistive Devices

A Marie Skłodowska-Curie PhD Fellowship in audio signal processing for hearing assistive devices

Research position at Oticon

A PhD research position is available in Oticon A/S and Dept. Electronic Systems, Aalborg University in the frame of the H2020 MSCA European Training Network “Service-Oriented Ubiquitous Network-Driven Sound (SOUNDS)” under the supervision of Prof. Jesper Jensen.

The PhD student will be fully embedded in the SOUNDS research and training network and will carry out applied research in the interdisciplinary field of signal processing, room acoustics, auditory perception, communication networks and machine learning.

The research will be executed at Oticon - a world-leading hearing aid manufacturer - and will involve several research visits to internationally renowned research labs in Europe.

A fully-funded 3-year research position in the frame of a 3-year doctoral program (resulting in a PhD degree awarded by Aalborg University) is offered.

The topic of the PhD project is: "Distributed sound processing for hearing aid applications"

The “SOUNDS” project

The SOUNDS European Training Network (ETN) revolves around a new and promising paradigm coined as Service- Oriented, Ubiquitous, Network-Driven Sound. Inspired by the ubiquity of mobile and wearable devices capable of capturing, processing, and reproducing sound, the SOUNDS ETN aims to bring audio technology to a new level by exploiting network-enabled cooperation between devices.

We envision the next generation of audio devices to be capable of providing enhanced hearing assistance, creating immersive audio experience, enabling advanced voice control and much more, by seamlessly exchanging signals and parameter settings, and spatially analyzing and reproducing sound jointly with other nearby audio devices and infrastructure.

Moreover, such functionality should be self-organizing, flexible, and scalable, requiring minimal user interaction for adapting to changes in the environment or network. It is anticipated that this paradigm will eventually result in an entirely new way of designing and using audio technology, by considering audio as a service enabled through shared infrastructure, rather than as a device-specific functionality limited by the capabilities and constraints of a single user device.

The ideal profile

To attain this paradigm shift in audio technology not only requires additional research but also calls for a new generation of qualified researchers with a transdisciplinary and international scientific profile, strong collaborative research and research management skills, and the intersectoral expertise needed to carry research results from academia to industry.

Candidates must

  • hold a Master degree in Electrical Engineering, Computer Science, or Engineering Acoustics (or equivalent),
  • have a solid mathematical background  (e.g. in matrix algebra, stochastic processes, etc.),
  • have taken specialized courses in at least one of the following disciplines: digital signal processing, acoustics, audio signal processing, machine learning,
  • have experience with scientific computing in Matlab, Python, or similar,
  • have excellent proficiency in the English language, as well as good communication skills, both oral and written.

Candidates who are in the final phase of their Master program are equally encouraged to apply, and should mention their expected graduation date.

Candidates must satisfy the eligibility conditions for MSCA Early Stage Researchers, i.e., they must have obtained their Master degree in the past 4 years and must not have resided or carried out their main activity (work, studies, etc.) in Denmark for more than 12 months in the past 3 years.

Why join us?

It is believed that the SOUNDS ETN will offer the best possible framework for achieving these goals, by organizing advanced interdisciplinary research training, developing solid transferable skills, and providing intersectoral and international experience in a network of qualified and complementary industrial and academic institutions.

We offer:

  • A high-level and exciting international research environment.
  • A strong involvement in a European research project with high international visibility.
  • A PhD title from one of Europe's top universities (after 3 years of successful research).
  • A thorough scientific education in the frame of a doctoral training program.
  • The possibility to participate in local as well as international courses, workshops and conferences.
  • The possibility to perform research visits to internationally renowned research labs in Europe.

The SOUNDS ETN strongly values research integrity, actively supports open access and reproducible research, and strives for diversity and gender balance in its entire research and training program. The SOUNDS ETN adheres to The European Charter for Researchers and The Code of Conduct for the Recruitment of Researchers.

Interested?

Then please send 1) your motivation letter with a statement of skills and research interests (max 1 page), 2) academic CV, including transcripts and possibly GRE/TOEFL results, 3) relevant diplomas, and 4)  names and contact information of 1-2 references to us using the link https://www.oticon.global/about/jobs/careers/open-positions/job-details2?id=14404. We hope we hear from you as soon as possible and no later than Jan. 3, 2021.

For questions please contact Prof. Jesper Jensen at jesj@demant.com.

We look forward to welcoming you.

Read more

PhD Students and Masters Post Docs

The Signal Acquisition, Modeling, Processing and Learning (SAMPL) lab headed by Prof. Yonina Eldar at the Weizmann Institute of Science is recruiting

PhD, MSc and post-doctoral students for cutting-edge research applying machine learning and deep networks to clinical problems in collaboration with leading hospitals in Israel and abroad.

Candidates with strong algorithmic background are invited to send their CV and 3 recommendation letters to yonina.eldar@weizmann.ac.il.

Read more