DISCO: auDIoviSual Crowd cOunting dataset

AuDIoviSual Crowd cOunting dataset (DISCO) consists of 1,935 images and audios from various typical scenes, a total of 170, 270 instances annotated with the head locations. The average, minimum and maximum number of people for each image are 87.99, 1 and 709, respectively. The motivation of building this dataset is that the louder we perceive the ambient sound to be, the more people there are. In a summary, DISCO dataset has three advantages comparing with others: 1) both audio and visual signals are provided; 2) cover different illuminations; and 3) a large variety of scenes are considered. For more details about this dataset, please refer to our paper. Download this dataset here

THE TEAM