Multi-Modal Multi-Scale Deep Learning for Large-Scale Image Annotation

Top Reasons to Join SPS Today!

1. IEEE Signal Processing Magazine
2. Signal Processing Digital Library*
3. Inside Signal Processing Newsletter
4. SPS Resource Center
5. Career advancement & recognition
6. Discounts on conferences and publications
7. Professional networking
8. Communities for students, young professionals, and women
9. Volunteer opportunities
10. Coming soon! PDH/CEU credits
Click here to learn more.

TIP Volume: 28 Issue: 4

Multi-Modal Multi-Scale Deep Learning for Large-Scale Image Annotation

TIP Featured Articles

By:

Yulei Niu, Zhiwu Lu, Ji-Rong Wen, Tao Xiang, Shih-Fu Chang

Image annotation aims to annotate a given image with a variable number of class labels corresponding to diverse visual concepts. In this paper, we address two main issues in large-scale image annotation: 1) how to learn a rich feature representation suitable for predicting a diverse set of visual concepts ranging from object, scene to abstract concept and 2) how to annotate an image with the optimal number of class labels. To address the first issue, we propose a novel multi-scale deep model for extracting rich and discriminative features capable of representing a wide range of visual concepts. Specifically, a novel two-branch deep neural network architecture is proposed, which comprises a very deep main network branch and a companion feature fusion network branch designed for fusing the multi-scale features computed from the main branch. The deep model is also made multi-modal by taking noisy user-provided tags as model input to complement the image input. For tackling the second issue, we introduce a label quantity prediction auxiliary task to the main label prediction task to explicitly estimate the optimal label number for a given image. Extensive experiments are carried out on two large-scale image annotation benchmark datasets, and the results show that our method significantly outperforms the state of the art.

Read on IEEE Xplore

SPS Social Media

IEEE SPS Facebook Page https://www.facebook.com/ieeeSPS
IEEE SPS X Page https://x.com/IEEEsps
IEEE SPS Instagram Page https://www.instagram.com/ieeesps/?hl=en
IEEE SPS LinkedIn Page https://www.linkedin.com/company/ieeesps/
IEEE SPS YouTube Channel https://www.youtube.com/ieeeSPS

IEEE SPS Educational Resources

IEEE SPS Resource Center

IEEE SPS YouTube Channel

© Copyright 2025 IEEE - All rights reserved. Use of this website signifies your agreement to the IEEE Terms and Conditions.
A public charity, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity.

Congratulations Image (1).png

SPS Members Recieve Prestigious IEEE Awards

congratulations.jpg

Congratulations to Signal Processing Society Members Elevated to Senior Members!

Submit_Manuscript_pg.jpg

Submit a Proposal for ICASSP 2030

What is Signal Processing?

Popular Pages

Today's:

All time:

Last viewed:

Multi-Modal Multi-Scale Deep Learning for Large-Scale Image Annotation

Publications & Resources

Transactions on Image Processing

For Authors

Congratulations Image (1).png

congratulations.jpg

Submit_Manuscript_pg.jpg

Top Reasons to Join SPS Today!

Multi-Modal Multi-Scale Deep Learning for Large-Scale Image Annotation

SPS Social Media

IEEE SPS Educational Resources

What is Signal Processing?

Popular Pages

Today's:

All time:

Last viewed:

Multi-Modal Multi-Scale Deep Learning for Large-Scale Image Annotation

Search form

You are here

Publications & Resources

Transactions on Image Processing

For Authors

Top Reasons to Join SPS Today!

Multi-Modal Multi-Scale Deep Learning for Large-Scale Image Annotation

SPS Social Media

IEEE SPS Educational Resources