Spoken Language Understanding (SLU) is a critical component of conversational voice assistants, requiring converting user utterances into a structured format for task executions. SLU systems typically consist of an ASR component to convert audio to text and an NLU component to convert text to a tree like structure, however recently, E2E SLU systems have also become of increasing interest in order to increase quality, model efficiency, and data efficiency. In this task, participants are asked to leverage the Spoken Task Oriented Parsing (STOP) dataset, a multi-domain compositional spoken language understanding, to explore E2E spoken language understanding on 3-axis (1) quality (2) on-device (3) low-resource and domain scaling. 5 winners will be selected from this challenge based on different criteria to be invited to submit a 2-page paper to ICASSP 2023.
Visit the Challenge website for details and more information!