The technology we use, and even rely on, in our everyday lives –computers, radios, video, cell phones – is enabled by signal processing. Learn More »
1. IEEE Signal Processing Magazine
2. Signal Processing Digital Library*
3. Inside Signal Processing Newsletter
4. SPS Resource Center
5. Career advancement & recognition
6. Discounts on conferences and publications
7. Professional networking
8. Communities for students, young professionals, and women
9. Volunteer opportunities
10. Coming soon! PDH/CEU credits
Click here to learn more.
Scene-Text Visual Question Answering (STVQA) is a comprehensive task that requires reading and understanding the text in images to answer the question. Existing methods of exploring the vision-language relationships between questions, images, and scene text have achieved impressive results. However, these studies heavily rely on auxiliary modules, such as external OCR systems and object detection networks, making the question-answering process cumbersome and highly dependent. In addition, OCR text is treated as textual content only in these approaches, while its visual learning is ignored. To alleviate the above problems, we propose a novel end-to-end dual-stream multi-loss training approach called DSTA. Our model first integrates a text spotter into multimodal learning to incorporate overall textual and visual OCR features. Specifically, we propose a novel dual-stream multi-loss training strategy that improves multimodal understanding while training question-answering. In addition, we design OCR Contrastive Learning (OCL) to enhance vision-language understanding by exploring the multimodal features of OCR text in depth. Experiments show that DSTA outperforms previous state-of-the-art methods on two STVQA benchmarks without any extra training data.
Home | Sitemap | Contact | Accessibility | Nondiscrimination Policy | IEEE Ethics Reporting | IEEE Privacy Policy | Terms | Feedback
© Copyright 2025 IEEE - All rights reserved. Use of this website signifies your agreement to the IEEE Terms and Conditions.
A public charity, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity.