The advent of spoken language processing (SLP) technologies on meeting transcripts is crucial for distilling, organizing, and prioritizing information. Meeting transcripts impose two key challenges to SLP tasks. First, meeting transcripts exhibit a wide variety of spoken language phenomena, leading to dramatic performance degradation. Second, meeting transcripts are usually long-form documents with several thousand words or more, posing a great challenge to mainstay Transformer-based models with high computational complexity.