Using Robots to Make Videos

Let me paint you a picture: You wake up in the morning, and you pull up YouTube to watch some funny content. You run across a video that is just a retelling of some interesting Reddit posts, get a few chuckles, throw it a like and a subscribe if you liked enough, and move on with your day.

These types of videos are very common, and some of them even do pretty well. In an effort to grow my own skills and push my limits, I decided that this year I would make my own channel like this, with the end goal of it being entirely automated. Thus, my YouTube journey began.

I started out simply enough, I needed to figure out how to make a video with code. After trying out a few different frameworks, I decided simpler is better and went for as bare bones an approach as possible. I would copy the text from Reddit posts into a text file, clean it up manually, and then a program I wrote would generate images from each line of text in the file. I added special annotations for the title image, a background color that changed with each new post, and we were off to the races.

After generating the visuals, I needed to create the audio to go along with it. I didn't want to read it myself, as that would defeat the whole purpose, so I used simple computer voice synthesis to generate the audio clips. I then stitched together each frame of the video, set the runtime based on the length of the audio, and we had a fresh video file.

Figure 1: Thumbnail art for the Reddit videos

This first version of the software was honestly kind of rough, and left a lot of manual processes to me to figure out. My next goal was to make it so I could download the Reddit posts into the text file automatically, so I built a downloader program. This software took in a text file with a list of post urls, went through each url, and downloaded it's contents for our use. It also does some grammar fixes, and shortens lines that are way too long. After the downloader was completed, I noticed the videos were a bit grainy, so I spent a dev session upping the resolution all the way to 4K.

Following this update with the video downloader and resolution, I decided it was time to clean up the audio. The audio so far was very obviously created by a computer, and I wanted the voice to sound at least mostly realistic. Making this change proved to be one of the biggest challenges of the whole project, but not quite for the reason you'd think. I have 5 different ai voice synthesis methods built out, and I can swap the class name I'm using to use any one of them when generating videos. However, they all sound terrible except for one. The reason for this, is Intel does not design their graphics cards with AI development in mind. I found out after building the first 4 solutions that none of them would work on my computer specifically, so I swapped to a system that focuses on CPU acceleration instead and got a much more natural sounding voice out of it.

From here, there was only one more major feature I really had planned. I didn't want to have to go to my computer to make the videos, I wanted to share the urls from my phone to discord, and tell a bot to download them, build the videos, and upload them to youtube for me. This actually went pretty well. I built a server in rust that responds to specific commands typed into discord, namely ~addurl, ~download, ~build, ~upload, and ~reset, with each giving appropriate feedback and information about the task that was completed. However, I may have been a bit overzealous in my approach.

The upload command probably will never realistically be something I can use, which is a shame. Currently, it pulls my last video from youtube, updates the compilation number by 1, and copies over the rest of the metadata like the description and tags for the new one, then schedules it for publish. The problem is that when I used this API, Google realized I made a bot to upload videos and marked my content as spam immediately. Now this is probably because when I was testing out the upload functionality I used the api to upload the same video like 10 times in a row, which fair, my bad. However, even after that initial day, anytime I went to upload a video with the API it would get flagged, so I think it's up to me to press upload from now on. It really makes you wonder why they would make an API in the first place though if I'm just going to get flagged for using it, but that's above my paygrade.

Overall, this was a fun and challenging project, and I can tell that I've grown a lot as a programmer by completing it. If you're interested in checking out the channel, it's called Reading Reddit. If I end up having to make a new one due to the upload fiasco, I'll make sure to change the link here as well. Future updates will probably include creating a bot that will automatically crawl Reddit for popular posts, but who knows. Until then, I hope you enjoy the videos!

Software Dev, Server Dev

Using Robots to Make Videos

What could we build together?