Skip to main content

State-Aware Compositional Learning Toward Unbiased Training for Scene Graph Generation

How to avoid biased predictions is an important and active research question in scene graph generation (SGG). Current state-of-the-art methods employ debiasing techniques such as resampling and causality analysis. However, the role of intrinsic cues in the features causing biased training has remained under-explored. In this paper, for the first time, we make the surprising observation that object identity information, in the form of object label embeddings (e.g. GLOVE), is principally responsible for biased predictions. 

Rethinking Sampling Strategies for Unsupervised Person Re-Identification

Unsupervised person re-identification (re-ID) remains a challenging task. While extensive research has focused on the framework design and loss function, this paper shows that sampling strategy plays an equally important role. We analyze the reasons for the performance differences between various sampling strategies under the same framework and loss function. We suggest that deteriorated over-fitting is an important factor causing poor performance, and enhancing statistical stability can rectify this problem. 

Sequential Order-Aware Coding-Based Robust Subspace Clustering for Human Action Recognition in Untrimmed Videos

Human action recognition (HAR) is one of most important tasks in video analysis. Since video clips distributed on networks are usually untrimmed, it is required to accurately segment a given untrimmed video into a set of action segments for HAR. As an unsupervised temporal segmentation technology, subspace clustering learns the codes from each video to construct an affinity graph, and then cuts the affinity graph to cluster the video into a set of action segments. 

Fuzzy Semantics for Arbitrary-Shaped Scene Text Detection

To robustly detect arbitrary-shaped scene texts, bottom-up methods are widely explored for their flexibility. Due to the highly homogeneous texture and cluttered distribution of scene texts, it is nontrivial for segmentation-based methods to discover the separatrixes between adjacent instances. To effectively separate nearby texts, many methods adopt the seed expansion strategy that segments shrunken text regions as seed areas, and then iteratively expands the seed areas into intact text regions.

EEFED: Personalized Federated Learning of Execution&Evaluation Dual Network for CPS Intrusion Detection

In the modern interconnected world, intelligent networks and computing technologies are increasingly being incorporated in industrial systems. However, this adoption of advanced technology has resulted in increased cyber threats to cyber-physical systems. Existing intrusion detection systems are continually challenged by constantly evolving cyber threats. Machine learning algorithms have been applied for intrusion detection. In these techniques, a classification model is trained by learning cyber behavior patterns.

Robust Moving Target Defence Against False Data Injection Attacks in Power Grids

Recently, moving target defence (MTD) has been proposed to thwart false data injection (FDI) attacks in power system state estimation by proactively triggering the distributed flexible AC transmission system (D-FACTS) devices. One of the key challenges for MTD in power grid is to design its real-time implementation with performance guarantees against unknown attacks.

Decouple and Resolve: Transformer-Based Models for Online Anomaly Detection From Weakly Labeled Videos

As one of the vital topics in intelligent surveillance, weakly supervised online video anomaly detection (WS-OVAD) aims to identify the ongoing anomalous events moment-to-moment in streaming videos, trained with only video-level annotations. Previous studies tended to utilize a unified single-stage framework, which struggled to simultaneously address the issues of online constraints and weakly supervised settings. To solve this dilemma, in this paper, we propose a two-stage-based framework, namely “decouple and resolve” (DAR), which consists of two modules, i.e., temporal proposal producer (TPP) and online anomaly localizer (OAL).

Practical Public Template Attack Attacks on CRYSTALS-Dilithium With Randomness Leakages

Side-channel security has become a significant concern in the NIST post-quantum cryptography standardization process. The lattice-based CRYSTALS-Dilithium (abbr. Dilithium) becomes the primary signature standard algorithm recommended by NIST for most use cases in July 2022 due to its excellent performance in security and efficiency. Compared to Dilithium’s rich theoretical security analysis results, the side-channel security of its physical implementations needs to be further explored. 

Spatial-Angular Versatile Convolution for Light Field Reconstruction

Spatial-angular separable convolution (SAS-conv) has been widely used for efficient and effective 4D light field (LF) feature embedding in different tasks, which mimics a 4D convolution by alternatively operating on 2D spatial slices and 2D angular slices. In this paper, we argue that, despite its global intensity modeling capabilities, SAS-conv can only embed local geometry information into the features, resulting in inferior performances in the regions with textures and occlusions. Because the epipolar lines are highly related to the scene depth, we introduce the concept of spatial-angular correlated convolution (SAC-conv).

Deep Learning-Based Non-Intrusive Multi-Objective Speech Assessment Model With Cross-Domain Features

This study proposes a cross-domain multi-objective speech assessment model, called MOSA-Net, which can simultaneously estimate the speech quality, intelligibility, and distortion assessment scores of an input speech signal. MOSA-Net comprises a convolutional neural network and bidirectional long short-term memory architecture for representation extraction, and a multiplicative attention layer and a fully connected layer for each assessment metric prediction. Additionally, cross-domain features (spectral and time-domain features) and latent representations from self-supervised learned (SSL) models are used as inputs to combine rich acoustic information to obtain more accurate assessments.