Convolutional Networks With Channel and STIPs Attention Model for Action Recognition in Videos
With the help of convolutional neural networks (CNNs), video-based human action recognition has made significant progress. CNN features that are spatial and channelwise can provide rich information for powerful image description. However, CNNs lack the ability to process the long-term temporal dependency of an entire video and further cannot well focus on the informative motion regions of actions.