A person’s speech is strongly conditioned by his own articulators and the language(s) he speaks, hence, rendering speech in an interspeaker or interlanguage manner from a source speaker’s speech data collected in his native language that is both academically challenging and technology/application desirable. The quality of the rendered speech is assessed in three dimensions: naturalness, intelligibility, and similarity to the source speaker. Usually, the three criteria cannot be all met when rendering is done in both cross-speaker and cross-language ways. We will analyze the key factors of rendering quality in both acoustic and phonetic domains objectively. Monolingual speech databases recorded by different speakers or bilingual ones recorded by the same speaker(s) are used. Measures in the acoustic space and phonetic space are adopted to quantify naturalness, intelligibility, and speaker’s timber objectively. Our “trajectory tiling” algorithm-based, cross-lingual TTS is used as the baseline system for comparison. To equalize speaker difference automatically, DNN-based ASR acoustic model trained speaker independently is used. Kullback-Leibler Divergence is proposed to statistically measure the phonetic similarity between any two given speech segments, which are from different speakers or languages, in order to select good rendering candidates. Demos of voice conversion, speaker adaptive TTS, and cross-lingual TTS will be shown as either interspeaker or interlanguage or both. The implications of this research on low-resourced speech research, speaker adaptation, “average speaker’s voice”, accented/dialectical speech processing, speech-to-speech translation, audio-visual TTS, etc., will be discussed.

DOI

https://dx.doi.org/10.17023/vxbk-9658

Duration

1:01:08

Subtitles

✖

Publications & Resources

Conferences & Events

Education & Training

Community & Involvement

Career & Industry

About IEEE SPS

For Volunteers

Education Center

Crossing Speaker and Language Barriers in Speech Processing

IEEE SPS Education Center FAQs

IEEE Signal Processing Society on

Publications & Resources

Conferences & Events

Education & Training

Community & Involvement

About IEEE SPS

For Volunteers

Career & Industry

Education & Training

Publications & Resources

Conferences & Events

Education & Training

Community & Involvement

Career & Industry

About IEEE SPS

For Volunteers

Education Center

Crossing Speaker and Language Barriers in Speech Processing

IEEE SPS Education Center FAQs

What is the IEEE SPS Education Center?

How Can I Access Materials?

Are Educational Credits Available?

How Do I Get Support?

IEEE Signal Processing Society on

Publications & Resources

Conferences & Events

Education & Training

Community & Involvement

About IEEE SPS

For Volunteers

Career & Industry

Education & Training