Skip to main content

Education Center

Foundational Speech Models and their Efficient Training with NVIDIA NeMo (video)

SHARE:
Category
Proficiency
Language
Media Type
Pricing

SPS Members $0.00
IEEE Members $11.00
Non-members $15.00

Date
The intersection of speech and language models offer unique opportunities and challenges. This talk provides a comprehensive walkthrough of speech-language model research from NVIDIA NeMo. We cover several types of models such as attention-encoder-decoder Canary-1B, and LLM-based architectures such as SALM or BESTOW. In particular, we highlight the challenges in training and inference efficiency of such models and propose robust solutions via 2D bucketing and batch size OOMptimizer. Finally, we highlight the difficulty of preserving text-domain capabilities in speech-augmented training and present several possible solutions: EMMeTT, VoiceTextBlender, and Canary-Qwen-2.5B.
Duration
1:43:31
Subtitles

IEEE SPS Education Center FAQs

The IEEE SPS Education Center is your hub for educational resources in signal processing. It offers a variety of materials tailored for students and professionals alike. You can explore content based on your specific interests and skill levels.

Select the program and click on the external link to the IEEE SPS Resource Center.

Educational credits in the form of professional development hours (PDHs) or continuing education units (CEUs) are available on select educational programs.