Visual Attention Prediction for Stereoscopic Video by Multi-Module Fully Convolutional Network

You are here

Top Reasons to Join SPS Today!

1. IEEE Signal Processing Magazine
2. Signal Processing Digital Library*
3. Inside Signal Processing Newsletter
4. SPS Resource Center
5. Career advancement & recognition
6. Discounts on conferences and publications
7. Professional networking
8. Communities for students, young professionals, and women
9. Volunteer opportunities
10. Coming soon! PDH/CEU credits
Click here to learn more.

Visual Attention Prediction for Stereoscopic Video by Multi-Module Fully Convolutional Network

By: 
Yuming Fang; Chi Zhang; Hanqin Huang; Jianjun Lei

Visual attention is an important mechanism in the human visual system (HVS) and there have been numerous saliency detection algorithms designed for 2D images/video recently. However, the research for fixation detection of stereoscopic video is still limited and challenging due to the complicated depth and motion information. In this paper, we design a novel multi-module fully convolutional network (MM-FCN) for fixation detection of stereoscopic video. Specifically, we design a fully convolutional network for spatial saliency prediction (S-FCN), where the initial spatial saliency map of stereoscopic video is learned by image database of object detection. Furthermore, the fully convolutional network for temporal saliency prediction (T-FCN) is constructed by combining saliency results from S-FCN and motion information from video frames. Finally, the fully convolutional network for depth fixation prediction (D-FCN) is designed to compute the final fixation map of stereoscopic video by learning depth features with spatiotemporal features from T-FCN. The experimental results show that the proposed MM-FCN can predict fixation results for stereoscopic video more effectively and efficiently than other related fixation prediction methods.

SPS ON TWITTER

IEEE SPS Educational Resources

IEEE SPS Resource Center

IEEE SPS YouTube Channel