Where Did ImageNet Come From?

November 2019

Dr. Fei-Fei Li is the inventor of ImageNet and the ImageNet Challenge, a critical large-scale dataset and benchmarking effort that has contributed to the latest developments in deep learning and AI. In addition to her technical contributions, she is a national leading voice for advocating diversity in STEM and AI.

Read full Bio

ImageNet has become one of the most influential visual datasets in the fields of Deep Learning and AI. More than 14 million photographs were gathered through a benchmarking effort that propelled the outbreak of Computer Vision and its wide range of applications such as surveillance, phone filters, medical imaging, biometry and autonomous cars. ImageNet is organised through 21,000 categories that are still being used today to train computational models.

In September 2019, ImageNet creator Fei-Fei Li gave a talk at The Photographers' Gallery talking through the events and key people that led to the datasets creation. In the following text, Katrina Sluis, Adjunct Research Curator at the Gallery expands on the context of the event.

The premise of the event was a 10th birthday party – not for any small human, but a dataset of around 14 million images.

A dataset in computer vision is a collection of digital photographs that developers use to train, test and evaluate the performance of their algorithms. Once assembled and packaged, a common set of photographs is shared among computer scientists. Using the same dataset gives the possibility to different developers to compare their work.

The gesture of a party for ImageNet asks us what it would mean to see a dataset not as an object that belongs only to computer science, but as something with a life, a history, and agency to act in the world? And what does it mean to consider the computer scientist as a significant producer and curator of photographs? This framing of a party is, therefore, a deliberate provocation to photography institutions such as this one who have historically valorised photography both as art and social practice, but treats the computer interface as an immaterial, transparent tool, rather than constitutive of culture. On the other hand, it is a provocation to the computer sciences, who have historically valorised the photograph as a transparent carrier of information. The recent public debates about ImageNet illustrate a need for more conversations across disciplinary boundaries about the consequences of classification and the politics of representation datasets are enmeshed in and perpetuate.

One of the key issues is that ImageNet itself is difficult to comprehend and even 'see' – with this in mind, the Gallery “exhibited ImageNet” on its Media Wall over a period of 2 months, at a rate of 90 milliseconds per image. ImageNet is a huge semantic machine, a feat of engineering which has brought together computer scientists, linguists, search engine algorithms, Flickr photographers and Amazon Mechanical Turkers to collectively produce a visual map of the world as identifiable objects. In this way ImageNet sits alongside other historical attempts at taxonomic photography projects which have been extensively theorised in photography, from Aby Warburg’s Memosyne and Malraux’s Le Musee Imaginaire to Edward Steichen’s The Family of Man, commercial databases such as Getty Images and Corbis to the colonial, medical and military photographic archives held by the state.