1. IEEE Signal Processing Magazine
2. Signal Processing Digital Library*
3. Inside Signal Processing Newsletter
4. SPS Resource Center
5. Career advancement & recognition
6. Discounts on conferences and publications
7. Professional networking
8. Communities for students, young professionals, and women
9. Volunteer opportunities
10. Coming soon! PDH/CEU credits
Click here to learn more.
News and Resources for Members of the IEEE Signal Processing Society
Title: An Overview of Voice Conversion and Its Challenges: From Statistical Modeling to Deep Learning
Date: 25 October 2022
Time: 10:00 AM Eastern (New York time)
Duration: Approximately 1 Hour
Presenters: Dr. Berrak Sisman, Dr. Simon King, Dr. Junichi Yamagishi, Dr. Haizou Li
Based on the IEEE Xplore® article: An Overview of Voice Conversion and Its Challenges: From Statistical Modeling to Deep Learning
Published: IEEE/ACM Transactions on Audio, Speech, and Language Processing, November 2020, available in IEEE Xplore®
Download article: The original article is available for download.
Voice conversion (VC) is a significant aspect of artificial intelligence. It is the study of how to convert one’s voice to sound like that of another without changing the linguistic content. Voice conversion belongs to a general technical field of speech synthesis, which converts text to speech or changes the properties of speech, for example, voice identity, emotion, and accents. Voice conversion involves multiple speech processing techniques, such as speech analysis, spectral conversion, prosody conversion, speaker characterization, and vocoding. With the recent advances in theory and practice, we are now able to produce human-like voice quality with high speaker similarity. In this talk, we provide a comprehensive overview of the state-of-the-art of voice conversion techniques and their performance evaluation methods from the statistical approaches to deep learning and discuss their promise and limitations. We will also present the recent Voice Conversion Challenges (VCC), the performance of the current state of technology, and provide a summary of the available resources for voice conversion research.
Dr. Berrak Sisman (Member, IEEE) received the Ph.D. degree in electrical and computer engineering from National University of Singapore in 2020, fully funded by A*STAR Graduate Academy under Singapore International Graduate Award (SINGA).
She is currently working as a tenure-track Assistant Professor at the Erik Jonsson School Department of Electrical and Computer Engineering at University of Texas at Dallas, United States. Prior to joining UT Dallas, she was a faculty member at Singapore University of Technology and Design (2020-2022). She was a Postdoctoral Research Fellow at the National University of Singapore (2019-2020). She was an exchange doctoral student at the University of Edinburgh and a visiting scholar at The Centre for Speech Technology Research (CSTR), University of Edinburgh (2019). She was a visiting researcher at RIKEN Advanced Intelligence Project in Japan (2018). Her research is focused on machine learning, signal processing, emotion, speech synthesis and voice conversion.
Dr. Sisman has served as the Area Chair at INTERSPEECH 2021, INTERSPEECH 2022, IEEE SLT 2022 and as the Publication Chair at ICASSP 2022. She has been elected as a member of the IEEE Speech and Language Processing Technical Committee (SLTC) in the area of Speech Synthesis for the term from January 2022 to December 2024. She plays leadership roles in conference organizations and active in technical committees. She has served as the General Coordinator of the Student Advisory Committee (SAC) of International Speech Communication Association (ISCA).
Dr. Simon King (Fellow, IEEE) received the M.A. (Cantab) and M.Phil. degrees from the University of Cambridge, Cambridge, U.K., and the Ph.D. degree from University of Edinburgh, Edinburgh, U.K.
Since 1993, he has been with the Centre for Speech Technology Research, University of Edinburgh, where he is currently Professor of Speech Processing and the Director of the Centre. His research interests include speech synthesis, recognition, and signal processing and he has approximately 230 publications across these areas.
Prof. King has served on the ISCA SynSIG Board and currently co-organizes the Blizzard Challenge. He has previously served on the IEEE SLTC and as an Associate Editor for the IEEE/ACM Transactions on Audio, Speech, and Language Processing, and is currently an Associate Editor for the area of Computer Speech and Language.
Dr. Junichi Yamagishi (SM' 13) received the Ph.D. degree from the Tokyo Institute of Technology (Tokyo Tech), Tokyo, Japan, in 2006.
From 2007 to 2013, he was a research fellow in the Centre for Speech Technology Research (CSTR) at the University of Edinburgh, UK. He was appointed Associate Professor at the National Institute of Informatics, Japan, in 2013. He is currently a Professor at the National Institute of Informatics (NII), Japan. His research topics include speech processing, machine learning, signal processing, biometrics, digital media cloning, and media forensics.
Dr. Yamagishi served previously as co-organizer for the bi-annual ASVspoof Challenge and the bi-annual Voice Conversion Challenge. He also served as a member of the IEEE Speech and Language Technical Committee (2013-2019), an Associate Editor of the IEEE/ACM Transactions on Audio Speech and Language Processing (2014-2017), and a chairperson of ISCA SynSIG (2017- 2021). He is currently a PI of JST-CREST and ANR-supported VoicePersona project and a Senior Area Editor of the IEEE/ACM Transactions on Audio, Speech, and Language Processing.
Dr. Haizou Li (Fellow, IEEE) received the B.Sc., M.Sc., and Ph.D. degree in electrical and electronic engineering from South China University of Technology, Guangzhou, China, in 1984, 1987, and 1990, respectively.
He is currently a Professor at the School of Data Science, the Chinese University of Hong Kong, Shenzhen, China, and the Department of Electrical and Computer Engineering, National University of Singapore (NUS). His research interests include automatic speech recognition, speaker and language recognition, and natural language processing. Prior to joining NUS, he taught at the University of Hong Kong (1988-1990) and South China University of Technology (1990-1994). He was a Visiting Professor at CRIN in France (1994-1995), Research Manager at the Apple-ISS Research Centre (1996-1998), Research Director in Lernout and Hauspie Asia Pacific (1999-2001), Vice President in InfoTalk Corporation, Ltd. (2001-2003), and the Principal Scientist and Department Head of Human Language Technology in the Institute for Infocomm Research, Singapore (2003-2016).
Dr. Li served as the Editor-in-Chief of IEEE/ACM Transactions on Audio, Speech and Language Processing (2015-2018), a Member of the Editorial Board of Computer Speech and Language (2012-2018). He was an elected Member of IEEE Speech and Language Processing Technical Committee (2013-2015), the President of the International Speech Communication Association (2015-2017), the President of Asia Pacific Signal and Information Processing Association (2015-2016), and the President of Asian Federation of Natural Language Processing (2017-2018). He was the General Chair of ACL 2012, INTERSPEECH 2014 and ASRU 2019. Dr Li is a Fellow of the IEEE and the ISCA. He was a recipient of the National Infocomm Award 2002 and the President’s Technology Award 2013 in Singapore. He was named one of the two Nokia Visiting Professors in 2009 by the Nokia Foundation, University of Bremen Excellence Chair Professor in 2019, and Fellow of Academy of Engineering Singapore in 2022.
|Call for Nominations: IEEE Technical Field Awards||15 January 2024|
|Call for Officer Nominations: Vice President-Technical Directions||19 January 2024|
|Call for Nominations for IEEE SPS Editors-in-Chief||31 January 2024|
|Nominate an IEEE Fellow today!||7 February 2024|
© Copyright 2023 IEEE – All rights reserved. Use of this website signifies your agreement to the IEEE Terms and Conditions.
A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity.