Historical Audio Search and Preservation: Finding Waldo Within the Fearless Steps Apollo 11 Naturalistic Audio Corpus

You are here

Top Reasons to Join SPS Today!

1. IEEE Signal Processing Magazine
2. Signal Processing Digital Library*
3. Inside Signal Processing Newsletter
4. SPS Resource Center
5. Career advancement & recognition
6. Discounts on conferences and publications
7. Professional networking
8. Communities for students, young professionals, and women
9. Volunteer opportunities
10. Coming soon! PDH/CEU credits
Click here to learn more.

Historical Audio Search and Preservation: Finding Waldo Within the Fearless Steps Apollo 11 Naturalistic Audio Corpus

By: 
Meena M. Chandra Shekar; John H.L. Hansen

Apollo 11 was the first manned space mission to successfully bring astronauts to the Moon and return them safely. As part of NASA’s goal in assessing team and mission success, all voice communications within mission control, astronauts, and support staff were captured using a multichannel analog system, which until recently had never been made available. More than 400 personnel served as mission specialists/support who communicated across 30 audio loops, resulting in 9,000+ h of data. It is essential to identify each speaker’s role during Apollo and analyze group communication to achieve a common goal. Manual annotation is costly, so this makes it necessary to determine robust speaker identification and tracking methods. In this study, a subset of 100 h derived from the collective 9,000 h of the Fearless Steps (FSteps) Apollo 11 audio data were investigated, corresponding to three critical mission phases: liftoff, lunar landing, and lunar walk. A speaker recognition assessment is performed on 140 speakers from a collective set of 183 NASA mission specialists who participated, based on sufficient training data obtained from 5 (out of 30) mission channels. We observe that SincNet performs the best in terms of accuracy and F score achieving 78.6% accuracy. Speaker models trained on specific phases are also compared with each other to determine if stress, g-force/atmospheric pressure, acoustic environments, etc., impact the robustness of the models. Higher performance was obtained using i-vector and x-vector systems for phases with limited data, such as liftoff and lunar walk. When provided with a sufficient amount of data (lunar landing phase), SincNet was shown to perform the best. This represents one of the first investigations on speaker recognition for massively large team-based communications involving naturalistic communication data. In addition, we use the concept of “Where’s Waldo?” to identify key speakers of interest (SOIs) and track them over the complete FSteps audio corpus. This additional task provides an opportunity for the research community to transition the FSteps collection as an educational resource while also serving as a tribute to the “heroes behind the heroes of Apollo.”

SPS Social Media

IEEE SPS Educational Resources

IEEE SPS Resource Center

IEEE SPS YouTube Channel