Associated SPS Event: IEEE ICASSP 2022 Grand Challenge
Recent development of speech signal processing, such as speech recognition, speaker diarization, etc., has inspired numerous applications of speech technologies. The meeting scenario is one of the most valuable and, at the same time, most challenging scenarios for speech technologies. Because such scenarios have free speaking styles and complex acoustic conditions such as overlapping speech, unknown number of speakers, far-field signals in large conference rooms, noise and reverberation etc.
However, the lack of large public real meeting data has been a major obstacle for advancement of the field. Since meeting transcription involves numerous related processing components, more informa- tion have to be carefully collected and labelled, such as speaker identity, speech context, onset/offset time, etc. All these information require precise and accurate annotations, which is expensive and time- consuming. Although several relevant datasets have been released, most of them suffer from various limitations, ranging from corpus setup such as corpus size, number of speakers, variety of spatial loca- tions relative to the microphone arrays, collection condition, etc., to corpus content such as recording quality, accented speech, speaking style, etc. Moreover, almost all public available meeting corpora are collected in English, and the differences among different languages limit the development of Mandarin meeting transcription. 
