Novel Architectures for Unsupervised Information Bottleneck Based Speaker Diarization of Meetings

TASLP Volume 29 | 2021

Novel Architectures for Unsupervised Information Bottleneck Based Speaker Diarization of Meetings

By:

Nauman Dawalatabad; Srikanth Madikeri;C. Chandra Sekhar; Hema A. Murthy

Speaker diarization is an important problem that is topical, and is especially useful as a preprocessor for conversational speech related applications. The objective of this article is two-fold: (i) segment initialization by uniformly distributing speaker information across the initial segments, and (ii) incorporating speaker discriminative features within the unsupervised diarization framework. In the first part of the work, a varying length segment initialization technique for Information Bottleneck (IB) based speaker diarization system using phoneme rate as the side information is proposed. This initialization distributes speaker information uniformly across the segments and provides a better starting point for IB based clustering. In the second part of the work, we present a Two-Pass Information Bottleneck (TPIB) based speaker diarization system that incorporates speaker discriminative features during the process of diarization. The TPIB based speaker diarization system has shown improvement over the baseline IB based system. During the first pass of the TPIB system, a coarse segmentation is performed using IB based clustering. The alignments obtained are used to generate speaker discriminative features using a shallow feed-forward neural network and linear discriminant analysis. The discriminative features obtained are used in the second pass to obtain the final speaker boundaries. In the final part of the paper, variable segment initialization is combined with the TPIB framework. This leverages the advantages of better segment initialization and speaker discriminative features that results in an additional improvement in performance. An evaluation on standard meeting datasets shows that a significant absolute improvement of 3.9% and 4.7% is obtained on the NIST and AMI datasets, respectively.

Given an audio signal, speaker diarization involves answering the question of “Who spoke When?” [1]. A speaker diarization system annotates audio with relative speaker labels. The task involves estimating the number of speakers and assigning speech segments to different speakers. Speaker diarization has been used in various domains, such as telephone conversations, broadcast news, and meetings [1]. Diarization of conversational audio meetings is considered to be a challenging task owing to the spontaneity in the conversation. Diarization systems are often used as front-ends in applications that include automatic speech recognition, spoken keyword spotting, and speaker recognition [2].

Information Bottleneck (IB) based approach to speaker diarization has shown competitive performance for meeting recordings [3]–[4][5]. Owing to its non-parametric nature, IB based diarization has a very low Real Time Factor (RTF) value [3], [6]. RTF is the time taken by a system to process 1 second of speech data. Since diarization systems are mostly used in the pre-processing stage of many conversational speech applications, it is desirable to have diarization systems with low RTF value.

Read on IEEE Xplore

Tags:

IEEE TASLP Article

nominate_3.jpg

Last Call for Nominations: Technical Committee Vice Chair and Member Positions

ICASSP 2026 Blog Header.png

Submit Your Papers for ICASSP 2026!

mentor_help_general_3.jpg

Call for Mentors: 2025 IEEE SPS SigMA Program - Signal Processing Mentorship Academy

What is Signal Processing?

Popular Pages

Today's:

All time:

Last viewed:

Novel Architectures for Unsupervised Information Bottleneck Based Speaker Diarization of Meetings

Publications & Resources

For Authors

nominate_3.jpg

ICASSP 2026 Blog Header.png

mentor_help_general_3.jpg

Top Reasons to Join SPS Today!

Novel Architectures for Unsupervised Information Bottleneck Based Speaker Diarization of Meetings

SPS Social Media

IEEE SPS Educational Resources

What is Signal Processing?

Popular Pages

Today's:

All time:

Last viewed:

Novel Architectures for Unsupervised Information Bottleneck Based Speaker Diarization of Meetings

Search form

You are here

Publications & Resources

For Authors

Top Reasons to Join SPS Today!

Novel Architectures for Unsupervised Information Bottleneck Based Speaker Diarization of Meetings

SPS Social Media

IEEE SPS Educational Resources