Fast Retinomorphic Event-Driven Representations for Video Gameplay and Action Recognition

You are here

Top Reasons to Join SPS Today!

1. IEEE Signal Processing Magazine
2. Signal Processing Digital Library*
3. Inside Signal Processing Newsletter
4. SPS Resource Center
5. Career advancement & recognition
6. Discounts on conferences and publications
7. Professional networking
8. Communities for students, young professionals, and women
9. Volunteer opportunities
10. Coming soon! PDH/CEU credits
Click here to learn more.

Fast Retinomorphic Event-Driven Representations for Video Gameplay and Action Recognition

By: 
Huaijin Chen; Wanjia Liu; Rishab Goel; Rhonald C. Lua; Siddharth Mittal; Yuzhong Huang; Ashok Veeraraghavan; Ankit B. Patel

Good temporal representations are crucial for video understanding, and the state-of-the-art video recognition framework is based on two-stream networks. In such framework, besides the regular ConvNets responsible for RGB frame inputs, a second network is introduced to handle the temporal representation, usually the optical flow (OF). However, OF or other task-oriented flow is computationally costly, and is thus typically pre-computed. Critically, this prevents the two-stream approach from being applied to reinforcement learning (RL) applications such as video game playing, where the next state depends on current state and action choices. Inspired by the early vision systems of mammals and insects, we propose a fast event-driven representation (EDR) that models several major properties of early retinal circuits: (1) logarithmic input response, (2) multi-timescale temporal smoothing to filter noise, and (3) bipolar (ON/OFF) pathways for primitive event detection. Trading off the directional information for fast speed (>9000 fps), EDR enables fast real-time inference/learning in video applications that require interaction between an agent and the world such as game-playing, virtual robotics, and domain adaptation. In this vein, we use EDR to demonstrate performance improvements over state-of-the-art reinforcement learning algorithms for Atari games, something that has not been possible with pre-computed OF. Moreover, with UCF-101 video action recognition experiments, we show that EDR performs near state-of-the-art in accuracy while achieving a 1,500x speedup in input representation processing, as compared to optical flow.

SPS on Twitter

  • CALL FOR PROPOSALS: The IEEE Workshop on Automatic Speech Recognition and Understanding is now soliciting proposals… https://t.co/gzYreLyroa
  • authors have started uploading their conference slides and posters to IEEE SPS SigPort! Get a sneak pea… https://t.co/XGvnfdrHIb
  • DEADLINE EXTENDED: The IEEE Journal of Selected Topics in Signal Processing is accepting papers for a Special Issue… https://t.co/E89M7bEFlu
  • Voting for the IEEE SPS 5-Minute Video Clip Contest is now live! Check out the three finalists and cast your vote f… https://t.co/fbqgHY1tw7
  • CALL FOR PROPOSALS: Now seeking proposals for the 2024 IEEE International Workshop on Machine Learning for Signal P… https://t.co/l7V1bF2qhT

SPS Videos


Signal Processing in Home Assistants

 


Multimedia Forensics


Careers in Signal Processing             

 


Under the Radar