Convolutional Networks With Channel and STIPs Attention Model for Action Recognition in Videos

Top Reasons to Join SPS Today!

1. IEEE Signal Processing Magazine
2. Signal Processing Digital Library*
3. Inside Signal Processing Newsletter
4. SPS Resource Center
5. Career advancement & recognition
6. Discounts on conferences and publications
7. Professional networking
8. Communities for students, young professionals, and women
9. Volunteer opportunities
10. Coming soon! PDH/CEU credits
Click here to learn more.

TMM Volume 22 Issue 9

Convolutional Networks With Channel and STIPs Attention Model for Action Recognition in Videos

By:

Hanbo Wu; Xin Ma; Yibin Li

With the help of convolutional neural networks (CNNs), video-based human action recognition has made significant progress. CNN features that are spatial and channelwise can provide rich information for powerful image description. However, CNNs lack the ability to process the long-term temporal dependency of an entire video and further cannot well focus on the informative motion regions of actions. Aiming at the two problems, we propose a novel video-based action recognition framework in this paper. We first represent videos with dynamic image sequences (DISs), which effectively describe videos by modeling the local spatial-temporal dynamics and dependencies. Then a channel and spatial-temporal interest points (STIPs) attention model (CSAM) based on CNNs is proposed to focus on the discriminative channels in networks and the informative spatial motion regions of human actions. Specifically, channel attention (CA) is implemented by automatically learning channel-wise convolutional features and assigning different weights for different channels. STIPs attention (SA) is encoded by projecting the detected STIPs on frames of dynamic image sequences into the corresponding convolutional feature map space. The proposed CSAM is embedded after CNN convolutional layers to refine the feature maps, followed by global average pooling to produce effective feature representations for videos. Finally frame-level video representations are fed into an LSTM to capture the temporal dependencies and make classification. Experiments on three challenging RGB-D datasets show that our method has better performance and outperforms the state-of-the-art approaches with only depth data.

Read on IEEE Xplore

Tags:

IEEE TMM Article

SPS Social Media

IEEE SPS Facebook Page https://www.facebook.com/ieeeSPS
IEEE SPS X Page https://x.com/IEEEsps
IEEE SPS Instagram Page https://www.instagram.com/ieeesps/?hl=en
IEEE SPS LinkedIn Page https://www.linkedin.com/company/ieeesps/
IEEE SPS YouTube Channel https://www.youtube.com/ieeeSPS

IEEE SPS Educational Resources

IEEE SPS Resource Center

IEEE SPS YouTube Channel

© Copyright 2025 IEEE - All rights reserved. Use of this website signifies your agreement to the IEEE Terms and Conditions.
A public charity, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity.

ieeee-sps-logo-social.png

2030 IEEE International Conferences on Acoustics, Speech, and Signal Processing (ICASSP 2030)

Congratulations Image (1).png

SPS Members Receive 2026 IEEE Technical Field Awards!

congratulations.jpg

Congratulations to Signal Processing Society Members Elevated to Senior Members!

What is Signal Processing?

Popular Pages

Today's:

All time:

Last viewed:

Convolutional Networks With Channel and STIPs Attention Model for Action Recognition in Videos

Transactions on Multimedia

Publications & Resources

For Authors

Congratulations Image (1).png

congratulations.jpg

Submit_Manuscript_pg.jpg

Top Reasons to Join SPS Today!

Convolutional Networks With Channel and STIPs Attention Model for Action Recognition in Videos

SPS Social Media

IEEE SPS Educational Resources

What is Signal Processing?

Popular Pages

Today's:

All time:

Last viewed:

Convolutional Networks With Channel and STIPs Attention Model for Action Recognition in Videos

Search form

You are here

Transactions on Multimedia

Publications & Resources

For Authors

Top Reasons to Join SPS Today!

Convolutional Networks With Channel and STIPs Attention Model for Action Recognition in Videos

SPS Social Media

IEEE SPS Educational Resources