Automated Data Collection From Youtube

3 min readNov 15, 2020

Walking through how to automatically collect and label data from Youtube in the realm of fencing.

Update October 2021 — The work discussed in this blog has lead to the introduction of Allez Go, the world’s first working AI fencing referee. Learn more in this blog here or at allzgo.com.

The first step in any machine learning project is data collection. Good data is the backbone of AI and perhaps the most important factor of a machine learning model’s success. Fortunately, we live in a time with plenty of data swirling on the internet. It’s just a matter of finding this data, processing, and labeling it correctly so we can use it for our model training.

First off, I want to give a huge thanks to sholtodouglas, who is responsible for a lot of the code used in this blog. You can check out his amazing repository here.

Our first step will be downloading all of our youtube videos, primarily from Fencing Vision. Thankfully, the videos are split up into playlists, so we can process the data in organized subsections rather than all at once. Using this website, we can get the links to all the videos in a playlist and we’ll save them as a text file. From there we can recursively download the videos using PyTube.

Next, we want to cut each video into individual clips where a hit actually occurs. To make things simpler, each clip will have a set length of 2 seconds, which should be long enough to capture the relevant actions. Fortunately, the European Fencing Federation has a standard score overlay on all of their videos. We can use this score overlay, which is always in the same position to easily detect touches using a color detector on certain pixels.

Now that we have a 2-second clip for every time a light goes off, we’ll filter out all the ones where only one light goes off and clips where both fencers hit off-target.

Labeling the data takes a bit of logical programming. If both fencers hit on target, we’ll look to see which fencer is awarded the touch later on by the human ref and then label it accordingly. We can use a pre-trained digit recognizer to keep track of the score. However, if only one fencer hits on target, and the other hits off-target (like in the image above), if the score increases, then it would be the right fencer’s right of way, and if it doesn’t change, it would be left fencer’s right of way.

After about a week of on and off data collection, we finally end up with our complete and labeled dataset, totaling nearly 10 GB and more than 8,000 clips.

Since this is as good a place as any to talk about naming convention and file management, I’ll go over how I organized my data during data collection. The data collection was split into 3 stages to make managing files easier. Since we’re pulling videos from Youtube, it’s naturally organized by playlist (phases). Each phase has 2 subfolders, left and right which acts as the class labels. From there, each individual clip is named in a set convention. For example “3-L46–58” represents the 58th touch of the 46th video of the 3rd phrase (playlist). While all of this may seem like a headache, having a set method of organizing data is invaluable during debugging and testing.

Next up in this blog series, we’ll preprocess the data so that we can feed it into a model. We’ll take an interesting approach to feature selection and look at several unique practices specific to this project that help reduce overfitting in our model.

Automated Data Collection From Youtube

Written by Jason Mo