Recallable Question Answering-Based Re-Ranking Considering Semantic Region for Cross-Modal Retrieval

You are here

Top Reasons to Join SPS Today!

1. IEEE Signal Processing Magazine
2. Signal Processing Digital Library*
3. Inside Signal Processing Newsletter
4. SPS Resource Center
5. Career advancement & recognition
6. Discounts on conferences and publications
7. Professional networking
8. Communities for students, young professionals, and women
9. Volunteer opportunities
10. Coming soon! PDH/CEU credits
Click here to learn more.

Recallable Question Answering-Based Re-Ranking Considering Semantic Region for Cross-Modal Retrieval

Rintaro Yanagi; Ren Togo; Takahiro Ogawa; Miki Haseyama

Question answering (QA)-based re-ranking methods for cross-modal retrieval have been recently proposed to further narrow down similar candidate images. The conventional QA-based re-ranking methods provide questions to users by analyzing candidate images, and the initial retrieval results are re-ranked based on the user's feedback. Contrary to these developments, only focusing on performance improvement makes it difficult to efficiently elicit the user's retrieval intention. To realize more useful QA-based re-ranking, considering the user interaction for eliciting the user's retrieval intention is required. In this paper, we propose a QA-based re-ranking method with considering two important factors for eliciting the user's retrieval intention: query-image relevance and recallability. Considering the query-image relevance enables to only focus on the candidate images related to the provided query text, while, focusing on the recallability enables users to easily answer the provided question. With these procedures, our method can efficiently and effectively elicit the user's retrieval intention. Experimental results using Microsoft Common Objects in Context and computationally constructed dataset including similar candidate images show that our method can improve the performance of the cross-modal retrieval methods and the QA-based re-ranking methods.


Multimedia information, especially images, has become familiar with the recent spread of wearable cameras and smartphones. We frequently record our lives as images, and the opportunity for sharing these depicted images has been increasing [1]. On the other hand, with these opportunities, manually managing and finding images on personal devices becomes taking a lot of effort [2]. Recently, to support such a situation, cross-modal retrieval methods that use a text as a query have been proposed as an effective image retrieval method [3][4][5][6][7][8]. Since we use texts in our daily life, using them as the query is convenient and has a wide range of applications [9]. Specifically, the cross-modal retrieval methods embed the provided text query and each candidate image in a shared space, and the embedded features are used for retrieving the relevant images. By especially focusing on the refinement of the embedding procedures, the conventional methods have improved the image retrieval performance.

SPS on Twitter

  • DEADLINE EXTENDED: The 2023 IEEE International Workshop on Machine Learning for Signal Processing is now accepting…
  • ONE MONTH OUT! We are celebrating the inaugural SPS Day on 2 June, honoring the date the Society was established in…
  • The new SPS Scholarship Program welcomes applications from students interested in pursuing signal processing educat…
  • CALL FOR PAPERS: The IEEE Journal of Selected Topics in Signal Processing is now seeking submissions for a Special…
  • Test your knowledge of signal processing history with our April trivia! Our 75th anniversary celebration continues:…

IEEE SPS Educational Resources

IEEE SPS Resource Center

IEEE SPS YouTube Channel