2. Signal Processing Digital Library*
3. Inside Signal Processing Newsletter
4. SPS Resource Center
5. Career advancement & recognition
6. Discounts on conferences and publications
7. Professional networking
8. Communities for students, young professionals, and women
9. Volunteer opportunities
10. Coming soon! PDH/CEU credits
Click here to learn more.
The good generalization performance of conventional pattern classifiers often relies on the size of training data labeled by costly human labor. These days, publicly available web resources grow explosively, and this allows us to easily obtain abundant and cheap web data. Yet, web data are usually not as cooperative as human labeled data. In this paper, we explore the use of web text data to aid image classification. Without requiring the previous collection of auxiliary data from the web, we directly retrieve the web text information with the aid of the powerful reverse image search engine. We develop a novel textual modeling method named semantic matching neural network (SMNN) that is capable of learning semantic features from the associated text of web images. The SMNN text features have improved reliability and applicability, compared to the text features obtained from other methods. The SMNN text features and convolutional neural network (CNN) visual features are merged into a shared representation, which learns to capture the correlations between the two modalities. Experimental results on benchmark UIUC-Sports, Scene-15, Caltech-256, and Pascal VOC-2012 data sets show that the visual and text modalities of data from different sources are remarkably complementary and the fusion of them achieves substantial performance improvement.
© Copyright 2020 IEEE – All rights reserved. Use of this website signifies your agreement to the IEEE Terms and Conditions.
A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity.