Google recently announced a new labeled dataset of human actions taking place in videos, named Atomic Visual Actions (AVA). It densely annotates 80 atomic visual actions in 57.6k movie clips with actions localized in space and time, resulting in 210k action labels with multiple labels per human occurring frequently. Compared with existing datasets, AVA possesses the following main characteristics: