Google Announces AVA: A Finely Labeled Video Dataset for Human Action Understanding

You are here

Inside Signal Processing Newsletter Home Page

Top Reasons to Join SPS Today!

1. IEEE Signal Processing Magazine
2. Signal Processing Digital Library*
3. Inside Signal Processing Newsletter
4. SPS Resource Center
5. Career advancement & recognition
6. Discounts on conferences and publications
7. Professional networking
8. Communities for students, young professionals, and women
9. Volunteer opportunities
10. Coming soon! PDH/CEU credits
Click here to learn more.

News and Resources for Members of the IEEE Signal Processing Society

Google Announces AVA: A Finely Labeled Video Dataset for Human Action Understanding

Google recently announced a new labeled dataset of human actions taking place in videos, named Atomic Visual Actions (AVA). It densely annotates 80 atomic visual actions in 57.6k movie clips with actions localized in space and time, resulting in 210k action labels with multiple labels per human occurring frequently. Compared with existing datasets, AVA possesses the following main characteristics:

  1. The definition of atomic visual actions, which avoids collecting data for each and every complex action.
  2. Precise spatio-temporal annotations with possibly multiple annotations for each human
  3. The use of diverse, realistic video material (movies).

Examples of 3-second video segments (from Video Source) with their bounding box annotations in the middle frame of each segment. (For clarity, only one bounding box is shown for each example.) Courtesy of Google.

Please visit for the announcement and for the dataset.


IEEE SPS Educational Resources

IEEE SPS Resource Center

IEEE SPS YouTube Channel