Skip to main content

TMM Articles

<p>TMM Articles</p>

Content-Aware Tunable Selective Encryption for HEVC Using Sine-Modular Chaotification Model

Existing High Efficiency Video Coding (HEVC) selective encryption algorithms only consider the encoding characteristics of syntax elements to keep format compliance, but ignore the semantic features of video content, which may lead to unnecessary computational and bit rate costs. To tackle this problem, we present a content-aware tunable selective encryption (CATSE) scheme for HEVC. First, a deep hashing network is adopted to retrieve groups of pictures (GOPs) containing sensitive objects.

Read more

LININ: Logic Integrated Neural Inference Network for Explanatory Visual Question Answering

Explanatory Visual Question Answering (EVQA) is a recently proposed multimodal reasoning task consisting of answering the visual question and generating multimodal explanations for the reasoning processes. Unlike traditional Visual Question Answering (VQA) task that only aims at predicting answers for visual questions, EVQA also aims to generate user-friendly explanations to improve the explainability and credibility of reasoning models.

Read more

JPEG Image Encryption With DC Rotation and Undivided RSV-Based AC Group Permutation

Existing JPEG encryption approaches pose a security risk due to the difficulty in changing all block-feature values while considering format compatibility and file size expansion. To address these concerns, this paper introduces a novel JPEG image encryption scheme. First, the security of sketch information against chosen-plaintext attacks is improved by increasing the change rate of block-feature values.

Read more

JPEG Image Encryption With DC Rotation and Undivided RSV-Based AC Group Permutation

Existing JPEG encryption approaches pose a security risk due to the difficulty in changing all block-feature values while considering format compatibility and file size expansion. To address these concerns, this paper introduces a novel JPEG image encryption scheme. First, the security of sketch information against chosen-plaintext attacks is improved by increasing the change rate of block-feature values.

Read more

End-to-End Instance-Level Human Parsing by Segmenting Persons

Instance-level human parsing is aimed at separately partitioning the human body into different semantic parts for each individual, which remains a challenging task due to human appearance/pose variation, occlusion and complex backgrounds. Most state-of-the-art methods follow the “parsing-by-detection” paradigm, which relies on a trained detector to localize persons and then sequentially performs single-person parsing for each person. However, this paradigm is closely related to the detector, and the runtime is proportional to the number of persons in an image.

Read more

Orientation-Aware Pedestrian Attribute Recognition Based on Graph Convolution Network

Pedestrian attribute recognition (PAR) aims to generate a structured description of pedestrians and plays an important role in surveillance. Current work focusing on 2D images can achieve decent performance when there is no variation in the captured pedestrian orientation. However, the performance of these works cannot be maintained in scenarios when the orientation of pedestrians is ignored. 

Read more

ATZSL: Defensive Zero-Shot Recognition in the Presence of Adversaries

Zero-shot learning (ZSL) has received extensive attention recently especially in areas of fine-grained object recognition, retrieval, and image captioning. Due to the complete lack of training samples and high requirement of defense transferability, the ZSL model learned is particularly vulnerable against adversarial attacks. Recent work also showed adversarially robust generalization requires more data.

Read more

Federated Adversarial Domain Hallucination for Privacy-Preserving Domain Generalization

Domain generalization aims to reduce the vulnerability of deep neural networks in the out-of-domain distribution scenario. With the recent and increasing data privacy concerns, federated domain generalization, where multiple domains are distributed on different local clients, has become an important research problem and brings new challenges for learning domain-invariant information from separated domains. 

Read more

Unified Adaptive Relevance Distinguishable Attention Network for Image-Text Matching

Image-text matching, as a fundamental cross-modal task, bridges the gap between vision and language. The core is to accurately learn semantic alignment to find relevant shared semantics in image and text. Existing methods typically attend to all fragments with word-region similarity greater than empirical threshold zero as relevant shared semantics, e.g. , via a ReLU operation that forces the negative to zero and maintains the positive.

Read more