Spoken Language Understanding (SLU) is a critical component of conversational voice assistants, requiring converting user utterances into a structured format for task executions. SLU systems typically consist of an ASR component to convert audio to text and an NLU component to convert text to a tree like structure, however recently, E2E SLU systems have also become of increasing interest in order to increase quality, model efficiency, and data efficiency.