Collective activity recognition, which tells what activity a group of people is performing, is a cutting-edge research topic in computer vision. Different from action performed by individuals, collective activity needs to consider the complex interactions among different people. However, most previous works require exhaustive annotations such as accurate label information of individual actions, pairwise interactions, and poses, which could not be easily available in practice.