DeepMind created a YouTube database of humans doing stuff that can help AIs to understand us

source
Matt Cardy/Getty Images

Google DeepMind has created a database of hundreds of thousands of YouTube clips that can help artificial intelligence (AI) agents to identify human actions such as drinking beer, riding a mechanical bull, and bench pressing.

The London-based AI lab, which was acquired by Google in 2014 for a reported £400 million, developed the “Kinetics” dataset for the ActivityNet Challenge and published a paper detailing the work in May. The full paper can be read here.

The database is comprised of some 300,000 already-published “realistic” and “challenging” YouTube clips that are no more than 10 seconds in length. Each clip contains a human doing one of 400 actions and has been tagged accordingly. There are 400 clips for each of the 400 actions.

“The actions are human focused and cover a broad range of classes including human-object interactions such as playing instruments, as well as human-human interactions such as shaking hands,” DeepMind’s authors wrote.

DeepMind fed the dataset into a number of off-the-shelf AIs and successfully taught them how recognise certain tasks. Interestingly, DeepMind had more luck teaching AIs how to recognise actions such as bowling, tennis, and trapezing than it did teaching them how to spot yawning, headbutting, and faceplanting.

caption
The easiest and the hardest actions for an off-the-shelf AI to identify.
source
Google DeepMind

Earlier reports suggested that DeepMind had used clips of Homer Simpson performing various actions but this was not the case.

A Google DeepMind spokesperson told Business Insider that data classification is essential to machine learning research.

“AI systems are now very good at recognising objects in images, but still have trouble making sense of videos,” said the spokesperson. “One of the main reasons for this is that the research community has so far lacked a large, high-quality video dataset, such as the one we now provide.

“We hope that the Kinetics dataset will help the machine learning community to advance models for video understanding, making a whole range of new research opportunities possible.”

A separate paper, published shortly after the first one, shows how DeepMind then used its own algorithms on the Kinetics dataset with even better results.

“We have shown that the performance of deep learning architectures can be substantially improved by first training on Kinetics, and then training and evaluating on standard action classification benchmarks,” the DeepMind spokesperson added.