Skip to main content

IEEE TMM Special Section on Multimodal Video Compression and Reconstruction: Theory, Algorithms, and Applications

Jul

30

YuandZhangBlogImage general

In recent years, the rapid advancement of multi-source sensing technologies has led to an explosive growth of multimodal video data, including RGB, depth, thermal, LiDAR, hyperspectral, and medical imaging modalities. By capturing complementary information from different physical and semantic perspectives, multimodal data enables comprehensive and fine-grained video representations for critical applications such as autonomous driving, remote sensing monitoring, and medical diagnosis. However, the massive scale and high dimensionality of multimodal video data have placed substantial pressure on existing storage, transmission, and computational infrastructures, increasingly becoming a key bottleneck that hinders large-scale deployment and real-world applications. This situation highlights an urgent need for efficient compression and high-fidelity reconstruction techniques.

Compared with traditional single-modality video compression and reconstruction, multimodal scenarios require joint modeling of inter-modal correlations and redundancies, while simultaneously addressing practical challenges such as temporal asynchrony, spatial misalignment, and modality-specific noise and distortions. The acquisition and processing of multimodal video data are often influenced by diverse sensor characteristics and environmental conditions, which can lead to varying degrees of degradation across modalities. Although conventional video coding methods and deep learning-based approaches, including CNNs, Transformers, and diffusion models, have achieved notable progress in reconstruction quality, existing solutions remain insufficient for multimodal settings. In particular, under strict bandwidth, computational, and latency constraints, effectively exploiting cross-modal redundancy to achieve both high compression efficiency and faithful reconstruction remains an open problem, calling for immediate and systematic investigation.

This special section is dedicated to showcasing state-of-the-art advances in multi-modal video compression and reconstruction. It aims to highlight new achievements and developments while addressing significant open issues and promising new directions in theory, algorithms, and applications within this field. 

Topics of interest for this special section include, but are not limited to:

Theoretical Foundations and Models

  • Rate-distortion theory for multimodal video coding
  • Cross-modal priors and hybrid physics-data-driven models
  • Theoretical analysis of generative and multimodal networks for video reconstruction

Algorithms and Techniques  

  • Learning-based video codecs that leverage multimodal information for semantic-aware compression and efficient rate allocation
  • Distributed and edge-aware compression frameworks that utilize multimodal information to achieve low-latency and bandwidth-efficient coding
  • Representation learning methods that unify multimodal information to support scalable video compression and reconstruction
  • Generative and diffusion-based models for restoring high-quality video from compressed, degraded, or incomplete multimodal information
  • Multimodal-guided approaches for super-resolution, frame interpolation, inpainting, and content repair with improved perceptual fidelity
  • Applications and Systems
  • Telemedicine and healthcare video analytics under bandwidth constraints
  • Immersive XR/VR streaming with multimodal compression and reconstruction
  • Autonomous systems and robotics with sensor–video fusion for robust perception
  • Benchmarks, datasets, and metrics for multimodal compression and reconstruction

Submission Guidelines

Prospective authors should carefully review the scope of the special section and submit their manuscripts via the IEEE Author Portal submission system.

Guest Editors:

  • Prof. Zhiyuan Zha (Lead Guest Editor), Jilin University, China
  • Prof. Bihan Wen, Nanyang Technological University, Singapore
  • Dr. Ding Liu, Meta, USA
  • Prof. Shirin Jalali, Rutgers University, USA
  • Prof. Giuseppe Valenzise, Université Paris-Saclay, France

Important dates:

  • Open for submissions: May 01, 2026
  • Manuscript submission due: July 30, 2026
  • First review completed: September 30, 2026
  • Revised manuscript due: October 30, 2026
  • Second review completed: November 30, 2026
  • Final manuscript due: December 15, 2026