Yesterday, I sat around and noticed a video posted on reddit which I knew was already posted earlier on the same sub. Therefore I decided to finally write a reddit bot.
I started using PRAW which is super easy to use. You can find the docs on the site.
The first step is always figuring out the goal and the process.
What should the bot do? I wanted it to find reposts of posts in a specific subreddit and post a comment listing all reposts.
How should the bot do it? I just started manually trying things out. I used the search function to find the same URL then I noticed different URLs for the same video. For youtube videos it worked best if I extract the video ID and search for it.
My first step was to create an account and get the newest posts.
import praw r = praw.Reddit(user_agent="USER AGENT") r.login('username', 'password') new_sub = r.get_subreddit('SUBREDIT').get_new()
It’s super straight forward. Then I looked up if I checked a post using a sqlite database. If it’s not already checked I look at the domain. If this domain is youtube.com we are going to extract the video id. Currently, I’ve only seen two formats which need different handling.
The first format is /watch?v=VIDEOID&...
In this case the video id is easily extracted using urlparse. The second format is a bit different. It’s mainly if people want to track attributions and looks like this: /attribution_link?a=ATTRIBUTIONID&u=%2Fwatch%3Fv%3DVIDEOID%26feature%3Dshare.
Again I extracted the query using urlparse and then parsed the '/watch?v'
part of the query again. This gives you the video id in this case.
If the domain isn’t youtube.com I just use the submitted URL.
Now I’m going to use reddits search function with the parameter url which just searches in the submitted url. For youtube videos the query is "url: VIDEOID"
otherwise "url: URL"
. Now I parse each result and compare its id to the post I actually checked to avoid false positives.
The next step is using ago which translates time differences into readable text (e.g. 20 minutes ago or 4 months ago) to indicate how old a repost actually is. The last step is to add a comment to the post which lists each previous submission including the time, a permalink, the title and up and down votes. Now I add the post id to my sqlite3 db so that it won’t be checked again.
I’m currently testing the bot if it works well enough I will probably open source it.