IEEE TMM Special Issue on Large Multi-modal Models for Dynamic Visual Scene Understanding

Top Reasons to Join SPS Today!

1. IEEE Signal Processing Magazine
2. Signal Processing Digital Library*
3. Inside Signal Processing Newsletter
4. SPS Resource Center
5. Career advancement & recognition
6. Discounts on conferences and publications
7. Professional networking
8. Communities for students, young professionals, and women
9. Volunteer opportunities
10. Coming soon! PDH/CEU credits
Click here to learn more.

TMM_SI_2.jpg

Aug

25

Special Issue Deadlines

IEEE TMM Special Issue on Large Multi-modal Models for Dynamic Visual Scene Understanding

Manuscript Due: 25 August 2024
Publication Date: TBD

Overview

Breakthroughs in large models, such as ChatGPT in language processing or large vision models (e.g., ViT or SAM), have demonstrated remarkable versatility and prowess in various fields and tasks. Despite their success, these single-modal models present limitations in fulfilling the broader requirements of daily-life applications, especially in the pursuit of achieving artificial general intelligence. This has spurred researchers in the multimedia community to delve into the realm of Large Multi-Modal Models (LMMs), exemplified by models like Clip, to enhance multi-modality understanding. More recent LMMs, such as Gemini (Google) and Sora (OpenAI), have demonstrated powerful ability to understand or create realistic and imaginative videos.

While LMMs have garnered widespread attention, they encounter numerous challenges in dealing with dynamic visual scenes. These challenges include integrating and aligning data from multiple modalities such as video, music, and 3D data, addressing issues related to domain shifts, handling noisy data and label problems, and discovering novel objects or patterns. Additionally, for comprehending video scenes, infusing temporal consistency and coherence properties into LMMs presents a significant challenge. Moreover, there is a critical need for research on achieving Parameter-Efficient Fine-Tuning of LMMs for diverse dynamic scene tasks.

This special issue aims to provide a platform for researchers to share their latest advances in the theory of Large Multi-modal Models for dynamic visual scene understanding. We also encourage submissions that explore the potential of LMMs for improving accessibility, diversity, and inclusivity in visual scene understanding.

We invite original and high-quality papers including but not limited to:

New LMM algorithms/models for dynamic visual scene understanding;
Text-video/ audio-video synthesis/generation and other multimedia algorithms;
Application of LMM in various industries and daily life applications, such as advertising and entertainment;
Dynamic video scene analysis with fundamental multi-modal AI models, such as video segmentation and video understanding;
Dynamic scene graph generation, knowledge discovering and reasoning;
Training and adaptation for LMMs, like lightweighting of LMM;
Open-world visual scene perception with large multi-modality models;
2D/3D visual scene parsing with LMMs.

Submission Guidelines

Prospective authors should submit their manuscripts following the IEEE TMM guidelines. Authors should submit a PDF version of their complete manuscript to according to the following schedule:

Important Dates

Submission Deadline: 25 August 2024
First Review: 15 October 2024
Revisions due: 15 November 2024
Second Review: 15 December 2024
Final Manuscripts Decision: 31 January 2025

Guest Editors

Wenguan Wang (Lead Guest Editor), Zhejiang University, China
Xiankai Lu, Shandong University, China
Mengshi Qi, Beijing University of Posts and Telecommunications, China
Zhengyuan Yang, Microsoft, USA
Yi Yang, Zhejiang University, China
Nicu Sebe, University of Trento, Italy
Jiebo Luo, University of Rochester, USA

Tags:

Special Issue Deadlines

SPS Social Media

IEEE SPS Facebook Page https://www.facebook.com/ieeeSPS
IEEE SPS X Page https://x.com/IEEEsps
IEEE SPS Instagram Page https://www.instagram.com/ieeesps/?hl=en
IEEE SPS LinkedIn Page https://www.linkedin.com/company/ieeesps/
IEEE SPS YouTube Channel https://www.youtube.com/ieeeSPS

IEEE SPS Educational Resources

IEEE SPS Resource Center

IEEE SPS YouTube Channel

© Copyright 2025 IEEE - All rights reserved. Use of this website signifies your agreement to the IEEE Terms and Conditions.
A public charity, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity.

webinar-online-internet-website-web-page-concept.jpg

SPS Webinar: Minor Manipulations, Major Threat: An Overview of Partially Fake Speech

samantha-borges-EeS69TTPQ18-unsplash

SPS SLTC/AASP Webinar: Advances and Challenges in Audio-Visual Sound Source Localization

conferencehighlight01-3596881-small.gif

ICASSP@50: A Recap [Conference Highlights]

What is Signal Processing?

Popular Pages

Today's:

All time:

Last viewed:

IEEE TMM Special Issue on Large Multi-modal Models for Dynamic Visual Scene Understanding

Conferences & Events

Top Reasons to Join SPS Today!

TMM_SI_2.jpg

Aug

25

IEEE TMM Special Issue on Large Multi-modal Models for Dynamic Visual Scene Understanding

Overview

Submission Guidelines

Important Dates

Guest Editors

Event Types

Events

webinar-online-internet-website-web-page-concept.jpg

samantha-borges-EeS69TTPQ18-unsplash

IEEE_SPS_BMSIT_1898x1278.jpg

YuandZhangBlogImage_general.jpg

ISBI2026

ICASSP_2026.jpg

SPS Social Media

IEEE SPS Educational Resources

ICASSP@50: A Recap [Conference Highlights]

What is Signal Processing?

Popular Pages

Today's:

All time:

Last viewed:

IEEE TMM Special Issue on Large Multi-modal Models for Dynamic Visual Scene Understanding

Search form

You are here

Conferences & Events

Top Reasons to Join SPS Today!

Aug

25

Overview

Submission Guidelines

Important Dates

Guest Editors

Event Types

Events

SPS Social Media

IEEE SPS Educational Resources