IEEE TMM Special Section on Multimodal Video Compression and Reconstruction: Theory, Algorithms, and Applications

Jul

30

In recent years, the rapid advancement of multi-source sensing technologies has led to an explosive growth of multimodal video data, including RGB, depth, thermal, LiDAR, hyperspectral, and medical imaging modalities. By capturing complementary information from different physical and semantic perspectives, multimodal data enables comprehensive and fine-grained video representations for critical applications such as autonomous driving, remote sensing monitoring, and medical diagnosis. However, the massive scale and high dimensionality of multimodal video data have placed substantial pressure on existing storage, transmission, and computational infrastructures, increasingly becoming a key bottleneck that hinders large-scale deployment and real-world applications. This situation highlights an urgent need for efficient compression and high-fidelity reconstruction techniques.

Compared with traditional single-modality video compression and reconstruction, multimodal scenarios require joint modeling of inter-modal correlations and redundancies, while simultaneously addressing practical challenges such as temporal asynchrony, spatial misalignment, and modality-specific noise and distortions. The acquisition and processing of multimodal video data are often influenced by diverse sensor characteristics and environmental conditions, which can lead to varying degrees of degradation across modalities. Although conventional video coding methods and deep learning-based approaches, including CNNs, Transformers, and diffusion models, have achieved notable progress in reconstruction quality, existing solutions remain insufficient for multimodal settings. In particular, under strict bandwidth, computational, and latency constraints, effectively exploiting cross-modal redundancy to achieve both high compression efficiency and faithful reconstruction remains an open problem, calling for immediate and systematic investigation.

This special section is dedicated to showcasing state-of-the-art advances in multi-modal video compression and reconstruction. It aims to highlight new achievements and developments while addressing significant open issues and promising new directions in theory, algorithms, and applications within this field.

Topics of interest for this special section include, but are not limited to:

Theoretical Foundations and Models

Rate-distortion theory for multimodal video coding
Cross-modal priors and hybrid physics-data-driven models
Theoretical analysis of generative and multimodal networks for video reconstruction

Algorithms and Techniques

Learning-based video codecs that leverage multimodal information for semantic-aware compression and efficient rate allocation
Distributed and edge-aware compression frameworks that utilize multimodal information to achieve low-latency and bandwidth-efficient coding
Representation learning methods that unify multimodal information to support scalable video compression and reconstruction
Generative and diffusion-based models for restoring high-quality video from compressed, degraded, or incomplete multimodal information
Multimodal-guided approaches for super-resolution, frame interpolation, inpainting, and content repair with improved perceptual fidelity
Applications and Systems
Telemedicine and healthcare video analytics under bandwidth constraints
Immersive XR/VR streaming with multimodal compression and reconstruction
Autonomous systems and robotics with sensor–video fusion for robust perception
Benchmarks, datasets, and metrics for multimodal compression and reconstruction

Submission Guidelines

Prospective authors should carefully review the scope of the special section and submit their manuscripts via the IEEE Author Portal submission system.

Guest Editors:

Prof. Zhiyuan Zha (Lead Guest Editor), Jilin University, China
Prof. Bihan Wen, Nanyang Technological University, Singapore
Dr. Ding Liu, Meta, USA
Prof. Shirin Jalali, Rutgers University, USA
Prof. Giuseppe Valenzise, Université Paris-Saclay, France

Important dates:

Open for submissions: May 01, 2026
Manuscript submission due: July 30, 2026
First review completed: September 30, 2026
Revised manuscript due: October 30, 2026
Second review completed: November 30, 2026
Final manuscript due: December 15, 2026

Publications & Resources

Conferences & Events

Education & Training

Community & Involvement

Career & Industry

About IEEE SPS

For Volunteers

IEEE TMM Special Section on Multimodal Video Compression and Reconstruction: Theory, Algorithms, and Applications

Jul

30

Tags

IEEE Signal Processing Society on

Publications & Resources

Conferences & Events

Education & Training

Community & Involvement

About IEEE SPS

For Volunteers

Career & Industry

Education & Training