LININ: Logic Integrated Neural Inference Network for Explanatory Visual Question Answering

You are here

IEEE Transactions on Multimedia

Top Reasons to Join SPS Today!

1. IEEE Signal Processing Magazine
2. Signal Processing Digital Library*
3. Inside Signal Processing Newsletter
4. SPS Resource Center
5. Career advancement & recognition
6. Discounts on conferences and publications
7. Professional networking
8. Communities for students, young professionals, and women
9. Volunteer opportunities
10. Coming soon! PDH/CEU credits
Click here to learn more.

LININ: Logic Integrated Neural Inference Network for Explanatory Visual Question Answering

By: 
Dizhan Xue; Shengsheng Qian; Quan Fang; Changsheng Xu

Explanatory Visual Question Answering (EVQA) is a recently proposed multimodal reasoning task consisting of answering the visual question and generating multimodal explanations for the reasoning processes. Unlike traditional Visual Question Answering (VQA) task that only aims at predicting answers for visual questions, EVQA also aims to generate user-friendly explanations to improve the explainability and credibility of reasoning models. To date, existing methods for VQA and EVQA ignore the prompt in the question and enforce the model to predict the probabilities of all answers. Moreover, existing EVQA methods ignore the complex relationships among question words, visual regions, and explanation tokens. Therefore, in this work, we propose a Logic Integrated Neural Inference Network (LININ) to restrict the range of candidate answers based on first-order-logic (FOL) and capture cross-modal relationships to generate rational explanations. Firstly, we design a FOL-based question analysis program to fetch a small number of candidate answers. Secondly, we utilize a multimodal transformer encoder to extract visual and question features, and conduct the prediction on candidate answers. Finally, we design a multimodal explanation transformer to construct cross-modal relationships and generate rational explanations. Comprehensive experiments on benchmark datasets demonstrate the superiority of LININ compared with the state-of-the-art methods for EVQA.

SPS Social Media

IEEE SPS Educational Resources

IEEE SPS Resource Center

IEEE SPS YouTube Channel