Datasets are large collections of digital information that are used to train AI. They might contain anything from weather data, such as air pressure and temperature, to photos, music, or indeed anything else that helps an AI system with the task it has been assigned.
Datasets are like textbooks for computers.

AI design teams have to carefully consider the data they choose to train their AI with, and may build in parameters that help the system make sense of the information it’s given.

Due to their scale and complexity, these collections can be very challenging to build and refine — whether they consist of a few hundred audio samples or extensive maps covering the whole of the known solar system.

For this reason, AI design teams often share datasets for the benefit of the wider scientific community, making it easier to collaborate and build on each other’s research.

Definition of Dataset from the A to Z of AI by Google and Oxford Internet Institute:

< Prev Next >