Adaptive Multimodal Graph Integration Network for Multimodal Sentiment Analysis

You are here

Top Reasons to Join SPS Today!

1. IEEE Signal Processing Magazine
2. Signal Processing Digital Library*
3. Inside Signal Processing Newsletter
4. SPS Resource Center
5. Career advancement & recognition
6. Discounts on conferences and publications
7. Professional networking
8. Communities for students, young professionals, and women
9. Volunteer opportunities
10. Coming soon! PDH/CEU credits
Click here to learn more.

Adaptive Multimodal Graph Integration Network for Multimodal Sentiment Analysis

By: 
Peizhu Gong; Jin Liu; Xiliang Zhang; Xingye Li; Lai Wei; Huihua He

Most current models for analyzing multimodal sequences often disregard the imbalanced contributions of individual modal representations caused by varying information densities, as well as the inherent multi-relational interactions across distinct modalities. Consequently, a biased understanding of the intricate interplay among modalities may be fostered, limiting prediction accuracy and effectiveness. To address these critical issues, we propose the Adaptive Multimodal Graph Integration Network (AMGIN) for multimodal sentiment analysis. Concretely, AMGIN transforms multimodal sequences into graph structures and discriminatively fuses complex intra- and inter-modal correlations by incorporating multiple edge types. To accurately modulate each modality's contribution, we present the Adaptive Modality Adjustment Mechanism (AMAM), consisting of two primary components: a modality refinement loss (MR Loss) that selectively optimizes the parameters of unimodal branches through backpropagation according to the relative confidence of modalities, and a modality-confidence gated module (MCGM) that adaptively filters out noise according to the deviation of modality-specific representations from the shared semantic distribution. Furthermore, we introduce feature reconstruction loss as an additional constraint to prevent excessive modification. To verify the effectiveness of our proposed approach, extensive experiments are conducted on three benchmark datasets commonly used in sentiment analysis, namely IEMOCAP, CMU-MOSEI, and CMU-MOSI. Additionally, we consider the task of multimodal humor detection on the UR-FUNNY dataset. The experimental results demonstrate the superiority of AMGIN over the state-of-the-art methods.

SPS Social Media

IEEE SPS Educational Resources

IEEE SPS Resource Center

IEEE SPS YouTube Channel