Automatic Speaker Recognition and Diarization in Co-Channel Speech

December 2018

Automatic Speaker Recognition and Diarization in Co-Channel Speech

This study investigates various aspects of multi-speaker interference and its impact on speaker recognition. Single-channel multi-speaker speech signals (aka co-channel speech) comprise a significant portion of speech processing data. Examples of co-channel signals are recordings from multiple speakers in meetings, conversations, debates, etc. The nuisances of co-channel speech are two-fold: 1) overlapped speech, and 2) non-overlapping speaker interference. In overlap, the direct effects of two stochastically similar, non-stationary signals added together disrupts speech processing performance, originally developed for single-speaker audio. For example, in speaker recognition, identifying speakers in overlapped segments is more difficult compared to single-speaker signals. Analyses in this study show that introducing overlapped speech increases speaker recognition error rates by an order of magnitude. In addition to the direct impact of overlap, its secondary effect is in how one speaker forces the other to change his/her speech characteristics. Different forms of co-channel data are investigated in this study. In scenarios where the focus is on overlap, independent cross-talk is used. Independent cross-talk refers to the summation of independent audio signals from different speakers to simulate overlap. The alternative form of data used in this study is real conversation recordings. Although conversations contain both overlapped and non-overlapped speech, independent cross-talk is a better source of overlap. The reason real conversations are not deemed sufficient for overlap analysis is the scarcity and non-uniformity of overlaps in typical conversations. Independent cross-talk is obtained from the GRID corpus, which was used in the speech separation challenge as a source of overlapped speech. Real conversations are obtained from the Switchboard telephone conversation corpus. Other real conversational data used throughout this study include: the AMI meeting corpus, Prof-lifelog, and UTDrive data. These datasets provide a perspective towards environment noise and co-channel interference in day-to-day speech. Categorizing datasets allows for a meticulous analysis of different aspects of co-channel speech. Most of the focus in analyzing overlaps is presented in the form of overlap detection techniques. This study proposes two overlap detection methods: 1) Pyknogram-based 2) Gammatone sub-band frequency modulation (GSFM). Both methods take advantage of the harmonic structure of speech to detect overlaps. Pyknograms do so by enhancing speech harmonics and evaluating dynamics across time, while GSFM magnifies the presence of multiple harmonics in different sub-bands. The other advancements proposed in this study use back-end modeling techniques to compensate for co-channel speech in real conversational data. These techniques are presented to reduce the impact of interfering speech in speaker-dependent models. Several methods are investigated, all of which propose a different modification to the popular probabilistic linear discriminant analysis (PLDA) used in state-of-the-art speaker recognition systems. In addition to model compensation techniques, this study presents CRSS-SpkrDiar, which is a speaker diarization research platform aimed at tackling conversational co-channel speech data. CRSS-SpkrDiar was developed during this study to alleviate end-to-end co-channel speech analysis. Taken collectively, the speech analysis, proposed features, and algorithmic advancements developed in this study all contribute to an improved understanding and measurable performance gain in speech/speaker technology for the co-channel speech problem.

Open Calls

Nomination/Position	Deadline
Call for Nominations for the SPS Ambassador Program 2026	20 February 2026
Deadline Extension: Call for Nominations for IEEE SPS Editors-in-Chief	28 February 2026
Submit Your Proposal for the 2026 Video and Image Processing Cup (VIP Cup)	1 March 2026
Call for Nominations: IEEE T-MM 2026 Multimedia Prize Paper Award	31 March 2026
Call for Nominations: Board of Governors Members-at-Large and Regional Directors-at-Large	3 April 2026

Nomination/Position

Deadline

Call for Nominations for the SPS Ambassador Program 2026

20 February 2026