Skip to main content

NEWS AND RESOURCES FOR MEMBERS OF THE IEEE SIGNAL PROCESSING SOCIETY

Google Announces AVA: A Finely Labeled Video Dataset for Human Action Understanding

Google recently announced a new labeled dataset of human actions taking place in videos, named Atomic Visual Actions (AVA). It densely annotates 80 atomic visual actions in 57.6k movie clips with actions localized in space and time, resulting in 210k action labels with multiple labels per human occurring frequently. Compared with existing datasets, AVA possesses the following main characteristics:

  1. The definition of atomic visual actions, which avoids collecting data for each and every complex action.
  2. Precise spatio-temporal annotations with possibly multiple annotations for each human
  3. The use of diverse, realistic video material (movies).

Examples of 3-second video segments (from Video Source) with their bounding box annotations in the middle frame of each segment. (For clarity, only one bounding box is shown for each example.) Courtesy of Google.

Please visit https://research.googleblog.com/2017/10/announcing-ava-finely-labeled-video.html for the announcement and https://research.google.com/ava/ for the dataset.