The prevailing use of both images and text to express opinions on the web leads to the need for multimodal sentiment recognition. Some commonly used social media data containing short text and few images, such as tweets and product reviews, have been well studied. However, it is still challenging to predict the readers’ sentiment after reading online news articles, since news articles often have more complicated structures, e.g., longer text and more images.