The speech signal is complex and contains a tremendous quantity of diverse information. The first step of extracting this information is to define an efficient representation that can model as much information as possible and will facilitate the extraction process. The I-vector representation is a statistical data-driven approach for feature extraction, which provides an elegant framework for speech classification and identification in general. This representation became the state of the art in several speech processing tasks and has been recently integrated with deep learning methods. This talk will focus on presenting variety of applications of the I-vector representation for speech and audio tasks including speaker profiling, speaker diarization and speaker health analysis. We will also show the possibility of using this representation to model and visualize information present in deep neural network hidden layers.
Computers have been changing the lives of the blind people. Voice synthesis technology has improved their educational environment and job opportunities by allowing them to access online services. Now, the new AI technologies are reaching the point where computers can help in sensing, recognizing, and understanding our living world, real-world. I will first introduce the concept of cognitive assistant for the blind, which will help blind and visually impaired to explore surroundings and enjoy city environment by assisting their missing visual sense by the power of integrated AI technologies. I will then introduce the latest technologies including the accurate indoor navigation system and the personal object recognition system, followed by the discussion of the role of the blind - how we can accelerate the advancement of AI technologies.