Skip to main content

IEEE Signal Processing Cup 2026

SHARE:

IEEE Signal Processing Cup at IEEE ICASSP 2026
AV Zoom: Real-Time Audio-Visual Zooming on Smartphones

IEEE ICASSP 2026 Website | 4-8 May 2026 | 2026 SP Cup Official Document

[Sponsored by the MathWorks and IEEE Signal Processing Society]

Introduction

This challenge aims to inspire young minds to tackle practical technical problems. Smartphones are now indispensable, offering features that simplify daily tasks—from entertainment to health tracking. One exciting development is audio zooming, which allows smartphones to focus on specific sounds while reducing background noise. This feature is especially useful in noisy environments like public gatherings, railway stations, and stadiums. While this technology has appeared in high-end models, it remains limited or ineffective in many devices.

When capturing images with a smartphone, one typically focuses on the scene and takes a shot. If specific details in the scene need emphasis, optical zooming in the camera provides a solution. Nowadays, optical zooming is also available while shooting videos. However, regardless of where the camera is pointed or how zoomed it is, audio is also captured. This can lead to a mismatch in synchronization between the captured video and audio, resulting in an unnatural experience. While the camera has an optical field of view, it lacks an auditory field of view. Achieving synchronization between what is seen and what is heard is crucial for enhancing user experience. This concept, known as "audio-visual zooming," integrates visual zoom capabilities with enhanced audio capture, enabling synchronized focus on both visual and auditory details (see Figure 1). This technology has the potential to revolutionize applications where precise audiovisual alignment is essential, such as photography, cinematography, and more.

The main issue is how to accurately locate and track audio-visual targets in a dynamic environment using low-power hardware. The goal is to improve the intelligent zooming in on sounds and visuals of interest, such as a person speaking in a noisy outdoor scene. This problem matters because most current solutions for audio-visual analysis require heavy computation and centralized processing, which makes them unsuitable for real-time use in remote, low-power, or privacy-sensitive situations. Processing at the edge in real time can create more efficiency. Current limitations include limited integration of sound source localization with visual tracking on low-resource devices, unreliable real-time performance, and constrained capabilities under tight compute budgets. Most systems either focus only on video or require cloud processing to achieve acceptable accuracy. 

The audio zooming problem can be viewed as 'spatial filtering' in the array signal processing context. There are many contemporary methods available, not many for audio zooming but in different applications (some useful references are provided in section 2.3 of the 2026 SP Cup Official Document). In general, the resolution of spatial filtering techniques such as beamforming is a function of the number of sensors (in this case, microphones). Higher resolution typically requires more microphones; however, due to space constraints, smartphones usually include only two or three microphones. There is no requirement that this problem must be addressed solely through beamforming. Since both audio and visual zooming are involved, creative and hybrid approaches are encouraged whether based on classical signal processing, artificial intelligence (Machine learning (ML) / Deep Learning / TinyML) or novel combinations of both. The focus of this challenge is to design a real-time audio-visual zooming system. This includes designing a microphone array configuration, developing processing algorithms, and building a mobile application for Android or iOS. The solution must also include real-time implementation and evaluation similar to the example presented in Figure 2. 

The system should integrate the following components:

  • Real-time audio zooming using microphone arrays to focus on specific sound sources.
  • Visual alignment with the chosen sound source, ensuring that what is heard matches what is seen.
  • All components should be optimized for edge devices (smartphones), with emphasis on low power, low latency, and fully on-device operation.
  • This challenge has three phases. 

Full technical details, dataset(s), evaluation metrics, and all other pertinent information about the competition is located in the "2026 SP Cup Official Document" (above).
 

Important Dates

  • Challenge Announcement/Registration Starts: 21 October 2025
  • Team Registration Deadline: 10 November 2025 - Registration Link
  • Phase 1 Team Work Submission Deadline: 11 December 2025
  • Announcement of the Phase 1 Results: 21 December 2025
  • Phase 2 Team Work Submission Deadline: 15 February 2026
  • Announcement of 3 Finalists Teams: 02 March 2026
  • Presentation of final results at ICASSP 2026: 4-8 May 2026

 

Registration and Important Resources

Official SP Cup Team Registration

  • All teams MUST be registered through the official competition registration system before the deadline in order to be considered as a participating team. Teams must meet all eligibility requirements at the time of team registration as well as throughout the competition
  • All team members for each team MUST agree to the SPS Student Terms and Conditions and submit a completed agreement form here before the team registration deadline.
  • Register your team for the 2026 SP Cup before the Team Registration Deadline date above and submit work before Final Submission Due date above at the following link: [Register your team HERE]

 

Finalist Teams

Grand Prize

Team: “SuperZooooom” (28667)
Wuhan University
Supervisor: Gongping Huang
Tutor: Yujie Zhu
Undergraduate Students: Gengyou Liu, Yanxin Tian, Yongyi Deng, Zhixiang Tang

First Runner-Up

Team: "Nyquist" (28742)
Bangladesh University of Engineering and Technology
Supervisor: Mohammad Ariful Haque
Tutor: Aye Thein Maung
Undergraduate Students: Fariha Anjum Oshin, Mahafuza Maisha, Md Abu Saleh Akib, Md. Nagib Mahfuz, MD. Symria Raihan, Riajul Karim Chowdhury, Sumayea Sultana, Wahi Farhan Hoque, Zarifa Tabassum

Second Runner-Up

Team: "Barn Owl” (28731)
Aviation and Aerospace University, Bangladesh (AAUB) & Khulna University of Engineering & Technology (KUET)
Supervisor: Md. Sakir Hossain
Undergraduate Students: Aliul Hassan Olee, Md. Nayeem, Muhtasim Redwan

 

Complimentary MATLAB License

MathWorks, Inc. continues to support the IEEE SP Cup. Participating students are encouraged to download the complimentary MathWorks Student Competitions Software for use in the competition

Instructions on how to apply for the complimentary MATLAB License can be found in the following DropBox folder

DropBox Folder: SP Cup - Complimentary MATLAB License (MathWorks)

 

Contacts

Competition Organizers (technical, competition-specific inquiries): Dr. Ashok Chandrasekaran  (alternate email: ieeespsavzoom@gmail.com)

SPS Staff (Terms & Conditions, Travel Grants, Prizes): Jaqueline Rash, SPS Membership Program and Events Manager

SPS Student Services Committee: Lucas Thomaz, Chair

Questions and general inquiries regarding the competition should be sent to sp-competitions@listserv.ieee.org.

 

Sponsors

This competition is sponsored by the IEEE Signal Processing Society and MathWorks:

IEEE Signal Processing Society   MathWorks