TIP Articles

You are here

Top Reasons to Join SPS Today!

1. IEEE Signal Processing Magazine
2. Signal Processing Digital Library*
3. Inside Signal Processing Newsletter
4. SPS Resource Center
5. Career advancement & recognition
6. Discounts on conferences and publications
7. Professional networking
8. Communities for students, young professionals, and women
9. Volunteer opportunities
10. Coming soon! PDH/CEU credits
Click here to learn more.

TIP Articles

TIP Articles

As compared to standard dynamic range (SDR) videos, high dynamic range (HDR) content is able to represent and display much wider and more accurate ranges of brightness and color, leading to more engaging and enjoyable visual experiences. HDR also implies increases in data volume, further challenging existing limits on bandwidth consumption and on the quality of delivered content.

Remote Photoplethysmography (rPPG) has been attracting increasing attention due to its potential in a wide range of application scenarios such as physical training, clinical monitoring, and face anti-spoofing. On top of conventional solutions, deep-learning approach starts to dominate in rPPG estimation and achieves top-level performance.

Cross-component prediction is an important intra-prediction tool in the modern video coders. Existing prediction methods to exploit cross-component correlation include cross-component linear model and its extension of multi-model linear model. These models are designed for camera captured content. For screen content coding, where videos exhibit different signal characteristics, a cross-component prediction model tailored to their characteristics is desirable.

Graph Convolutional Networks (GCN) which typically follows a neural message passing framework to model dependencies among skeletal joints has achieved high success in skeleton-based human motion prediction task. Nevertheless, how to construct a graph from a skeleton sequence and how to perform message passing on the graph are still open problems, which severely affect the performance of GCN.

As an important yet challenging task in Earth observation, change detection (CD) is undergoing a technological revolution, given the broadening application of deep learning. Nevertheless, existing deep learning-based CD methods still suffer from two salient issues: 1) incomplete temporal modeling, and 2) space-time coupling. In view of these issues, we propose a more explicit and sophisticated modeling of time and accordingly establish a pair-to-video change detection (P2V-CD) framework. First, a pseudo transition video that carries rich temporal information is constructed from the input image pair, interpreting CD as a problem of video understanding.

How to avoid biased predictions is an important and active research question in scene graph generation (SGG). Current state-of-the-art methods employ debiasing techniques such as resampling and causality analysis. However, the role of intrinsic cues in the features causing biased training has remained under-explored. In this paper, for the first time, we make the surprising observation that object identity information, in the form of object label embeddings (e.g. GLOVE), is principally responsible for biased predictions. 

Unsupervised person re-identification (re-ID) remains a challenging task. While extensive research has focused on the framework design and loss function, this paper shows that sampling strategy plays an equally important role. We analyze the reasons for the performance differences between various sampling strategies under the same framework and loss function. We suggest that deteriorated over-fitting is an important factor causing poor performance, and enhancing statistical stability can rectify this problem. 

Human action recognition (HAR) is one of most important tasks in video analysis. Since video clips distributed on networks are usually untrimmed, it is required to accurately segment a given untrimmed video into a set of action segments for HAR. As an unsupervised temporal segmentation technology, subspace clustering learns the codes from each video to construct an affinity graph, and then cuts the affinity graph to cluster the video into a set of action segments. 

To robustly detect arbitrary-shaped scene texts, bottom-up methods are widely explored for their flexibility. Due to the highly homogeneous texture and cluttered distribution of scene texts, it is nontrivial for segmentation-based methods to discover the separatrixes between adjacent instances. To effectively separate nearby texts, many methods adopt the seed expansion strategy that segments shrunken text regions as seed areas, and then iteratively expands the seed areas into intact text regions.

Weighted multi-view clustering (MVC) aims to combine the complementary information of multi-view data (such as image data with different types of features) in a weighted manner to obtain a consistent clustering result. However, when the cluster-wise weights across views are vastly different, most existing weighted MVC methods may fail to fully utilize the complementary information, because they are based on view-wise weight learning and can not learn the fine-grained cluster-wise weights.



IEEE SPS Educational Resources

IEEE SPS Resource Center

IEEE SPS YouTube Channel