Skip to main content

TMM Volume 23 | 2021

Semantic-Driven Interpretable Deep Multi-Modal Hashing for Large-Scale Multimedia Retrieval

Multi-modal hashing focuses on fusing different modalities and exploring the complementarity of heterogeneous multi-modal data for compact hash learning. However, existing multi-modal hashing methods still suffer from several problems, including: 1) Almost all existing methods generate unexplainable hash codes. They roughly assume that the contribution of each hash code bit to the retrieval results is the same, ignoring the discriminative information embedded in hash learning and semantic similarity in hash retrieval.

Read more

On Reliable Multi-View Affinity Learning for Subspace Clustering

In multi-view subspace clustering, the low-rankness of the stacked self-representation tensor is widely accepted to capture the high-order cross-view correlation. However, using the nuclear norm as a convex surrogate of the rank function, the self-representation tensor exhibits strong connectivity with dense coefficients. When noise exists in the data, the generated affinity matrix may be unreliable for subspace clustering as it retains the connections across inter-cluster samples due to the lack of sparsity.

Read more

LD-MAN: Layout-Driven Multimodal Attention Network for Online News Sentiment Recognition

The prevailing use of both images and text to express opinions on the web leads to the need for multimodal sentiment recognition. Some commonly used social media data containing short text and few images, such as tweets and product reviews, have been well studied. However, it is still challenging to predict the readers’ sentiment after reading online news articles, since news articles often have more complicated structures, e.g., longer text and more images.

Read more

Dense Video Captioning Using Graph-Based Sentence Summarization

Recently, dense video captioning has made attractive progress in detecting and captioning all events in a long untrimmed video. Despite promising results were achieved, most existing methods do not sufficiently explore the scene evolution within an event temporal proposal for captioning, and therefore perform less satisfactorily when the scenes and objects change over a relatively long proposal. To address this problem, we propose a graph-based partition-and-summarization (GPaS) framework for dense video captioning within two stages.

Read more

A New Image Compression Algorithm Based on Non-Uniform Partition and U-System

JPEG lossy image compression is a still image compression algorithm model that is currently widely used in major network media. However, it is unsatisfactory in the quality of compressed images at low bit rates. The objective of this paper is to improve the quality of compressed images and suppress blocking artifacts by improving the JPEG image compression model at low bit rates.

Read more

Adversarial Learning for Personalized Tag Recommendation

We have recently seen great progress in image classification due to the success of deep convolutional neural networks and the availability of large-scale datasets. Most of the existing work focuses on single-label image classification. However, there are usually multiple tags associated with an image. The existing works on multi-label classification are mainly based on lab curated labels.

Read more

A 460 GOPS/W Improved Mnemonic Descent Method-Based Hardwired Accelerator for Face Alignment

The mnemonic descent method (MDM) algorithm is the first end-to-end recurrent convolutional system for high-accuracy face alignment. However, the heavy computational complexity and high memory access demands make it difficult to satisfy the requirements of real-time applications. To address this problem, an improved MDM (I-MDM) algorithm is proposed for efficient hardware implementation based on several hardware-oriented optimizations.

Read more