State-Aware Compositional Learning Toward Unbiased Training for Scene Graph Generation

Top Reasons to Join SPS Today!

1. IEEE Signal Processing Magazine
2. Signal Processing Digital Library*
3. Inside Signal Processing Newsletter
4. SPS Resource Center
5. Career advancement & recognition
6. Discounts on conferences and publications
7. Professional networking
8. Communities for students, young professionals, and women
9. Volunteer opportunities
10. Coming soon! PDH/CEU credits
Click here to learn more.

TIP Volume 32 | 2023

State-Aware Compositional Learning Toward Unbiased Training for Scene Graph Generation

TIP Articles

By:

Tao He; Lianli Gao; Jingkuan Song; Yuan-Fang Li

How to avoid biased predictions is an important and active research question in scene graph generation (SGG). Current state-of-the-art methods employ debiasing techniques such as resampling and causality analysis. However, the role of intrinsic cues in the features causing biased training has remained under-explored. In this paper, for the first time, we make the surprising observation that object identity information, in the form of object label embeddings (e.g. GLOVE), is principally responsible for biased predictions. We empirically observe that, even without any visual features, a number of recent SGG models can produce comparable or even better results solely from object label embeddings. Motivated by this insight, we propose to leverage a conditional variational auto-encoder to decouple the entangled visual features into two meaningful components: the object’s intrinsic identity features and the extrinsic, relation-dependent state feature. We further develop two compositional learning strategies on the relation and object levels to mitigate the data scarcity issue of rare relations. On the two benchmark datasets Visual Genome and GQA, we conduct extensive experiments on the three scenarios, i.e., conventional, few-shot and zero-shot SGG. Results consistently demonstrate that our proposed Decomposition and Composition (DeC) method effectively alleviates the biases in the relation prediction. Moreover, DeC is model-free, and it significantly improves the performance of recent SGG models, establishing new state-of-the-art performance.

Scene graph generation (SGG), or visual relation detection (VRD), is a pivotal step towards scene understanding of visual contents. SGG has received enormous research interest in recent years [1], [2], [3], [4], [5], [6] as it can provide detailed structured-representation of an image for conducting high-level visual reasoning, making it useful for many down-stream tasks such as visual captioning [7], [8], [9], visual question answering [10], [11], numan-object interaction detection [12] and 3D scene understanding [13], [14]. In general, SGG aims to produce an object-based relation graph, namely a scene graph (SG) that contains grounded visual representation of an image. Particularly, an SG can be presented as a set of relation triples, each of which can be denoted as a triple format, i.e., <subject, PREDICATE, object> .

Read on IEEE Xplore

Tags:

IEEE TIP Article

SPS Social Media

IEEE SPS Facebook Page https://www.facebook.com/ieeeSPS
IEEE SPS X Page https://x.com/IEEEsps
IEEE SPS Instagram Page https://www.instagram.com/ieeesps/?hl=en
IEEE SPS LinkedIn Page https://www.linkedin.com/company/ieeesps/
IEEE SPS YouTube Channel https://www.youtube.com/ieeeSPS

IEEE SPS Educational Resources

IEEE SPS Resource Center

IEEE SPS YouTube Channel

© Copyright 2025 IEEE - All rights reserved. Use of this website signifies your agreement to the IEEE Terms and Conditions.
A public charity, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity.

webinar_cube.jpg

SPS JSTSP Webinar: Distributed Signal Processing for Extremely Large-Scale Antenna Array Systems

nominate_blue.jpg

Call for Nominations for Chair, Women in Signal Processing Committee (WISP)

Nominate-Blog-Header.jpg

Call for Nominations for Chair, Scholarship Committee

What is Signal Processing?

Popular Pages

Today's:

All time:

Last viewed:

State-Aware Compositional Learning Toward Unbiased Training for Scene Graph Generation

Publications & Resources

Transactions on Image Processing

For Authors

nominate_blue.jpg

Nominate-Blog-Header.jpg

2025CertifiedChapterBanner.jpg

Top Reasons to Join SPS Today!

State-Aware Compositional Learning Toward Unbiased Training for Scene Graph Generation

SPS Social Media

IEEE SPS Educational Resources

What is Signal Processing?

Popular Pages

Today's:

All time:

Last viewed:

State-Aware Compositional Learning Toward Unbiased Training for Scene Graph Generation

Search form

You are here

Publications & Resources

Transactions on Image Processing

For Authors

Top Reasons to Join SPS Today!

State-Aware Compositional Learning Toward Unbiased Training for Scene Graph Generation

SPS Social Media

IEEE SPS Educational Resources