PRAW
Contents
PRAW#
Introduction to PRAW#
PRAW
is an API wrapper for Reddit. It lets you access Reddit content without worrying about violating Reddit’s data access rules. This notebook is designed to get you going with the basics. See here for full documentation.
Installing PRAW#
Within the psych750 environment, install praw using pip
:
pip install praw
Working with PRAW#
We first need to create an instance of the Reddit class using an existing Reddit account. We created an account for testing purposes:
import praw
r = praw.Reddit(
client_id = 'BHpXy52FE-8za63YtAqvOQ',
client_secret = 'GWEUpuB6M4q9YB2q9SOvY6OuGJ20JQ',
password='psych750!Tutorial',
user_agent = 'testscript by u/tutorial_for_praw',
username='tutorial_for_praw'
)
print(r.user.me()) # Checking if it's working, it should just show the user name
tutorial_for_praw
Authenticating PRAW to access Reddit#
To use PRAW, you need to authenticate with your Reddit account. For your own code, you’ll want to use your own account (or create a new one). First create a new Reddit account or use one you already have. Then go here and click on the are you a developer? create an app...
button to create an application. Name it psych750
and choose script
. Add in a short description and use http://localhost:8080
as the redirect uri. The usename is your regular username and the password is that user’s password. The client_id string is the code in the upper-left by user_script
Accessing top posts#
Hottest 10 posts on Reddit (as of 11/30/23).
for cur_submission in r.front.hot(limit = 10):
submission = r.submission(cur_submission)
print(submission.title, submission.url)
Henry Kissinger, secretary of state to Richard Nixon, dies at 100 https://www.theguardian.com/us-news/2023/nov/29/henry-kissinger-dies-secretary-of-state-richard-nixon?CMP=Share_AndroidApp_Other
3 killed, at least 11 injured in shooting attack near entrance to Jerusalem https://www.ynetnews.com/article/hylzfssh6
“For gamers. By gamers.” https://i.redd.it/n3hczyqwpf3c1.jpg
Look out! https://v.redd.it/b0t4xvho3e3c1
What is something in recent times that has gone too far but no one will admit? https://www.reddit.com/r/AskReddit/comments/18797lo/what_is_something_in_recent_times_that_has_gone/
These were found in my wife’s grandfathers stuff after he passed on. https://i.redd.it/m17ozsaroe3c1.jpg
Gus 100% believes Grammy’s visits are for him. He’s not wrong https://www.reddit.com/gallery/1872hv4
My wife. Early 80s. https://i.redd.it/bddshnhhke3c1.jpeg
A six-planet solar system in perfect synchrony has been found in the Milky Way https://apnews.com/article/six-planets-solar-system-nasa-esa-3d67e5a1ba7cbea101d756fc6e47f33d
Mysterious woman tells school board that Scholastic book sparked porn addiction https://popular.info/p/mysterious-woman-tells-school-board
How did I know to access submission.title
and submission.url
? Read the API docs!
Accessing comments#
Let’s now see how we can access comments on a post. This post talks about how your brain temporarily loses the ability to create new memories when you experience a blackout from alcohol. It has more than 54000 comments and we can use PRAW to extract all the comments and see what people’s opinions are. See here for info on extracting comments.
Let’s take a look at the top 5 comments.
Access comments via the submission’s URL#
Let’s acces top 5 comments of a particular submission. We can access the submission using its URL:
url = 'https://www.reddit.com/r/todayilearned/comments/ysm4ys/til_blacking_out_from_alcohol_doesnt_cause_you_to/'
submission = r.submission(url = url)
for top_level_comment in submission.comments[:5]:
print(top_level_comment.body)
Anterograde amnesia: the inability to form new memories. Alcohol blackout is *temporary* anterograde amnesia; it goes away when the alcohol intoxication goes away. Permanent anterograde amnesia is a thing; my boss suffered a heart attack years ago, and it left him with permanent anterograde amnesia. When you cannot learn new information at all, it's pretty hard to work in I.T., where you have to learn new stuff all the time. Management retired him pretty quick.
Pro tip: I'd you ever want to know if someone is "Blacked Out" ask them the same question a few times within a few minutes, they usually won't remember.
I like to call this state Read Only Mode
[Anterograde amnesia](https://en.wikipedia.org/wiki/Anterograde_amnesia).
Some drugs which have the same effect are sometimes used for potentially mentally traumatic medical or dental procedures.
Access comments via the submission’s ID#
Or we can access comments using a submission’s unique id. Let’s look at the currently top submissions again, but this time output its id as well as the poster’s username.
for cur_submission in r.front.hot(limit = 10):
# print(cur_submission)
submission = r.submission(cur_submission)
print(submission.title,submission.id,submission.author,'\n')
Henry Kissinger, secretary of state to Richard Nixon, dies at 100 18770kx MrRedXiii
3 killed, at least 11 injured in shooting attack near entrance to Jerusalem 187dboi The2lackSUN
Look out! 1876c99 DCArchibald
“For gamers. By gamers.” 187ck93 YourOldComp
What is something in recent times that has gone too far but no one will admit? 18797lo shado-walkerrrr
These were found in my wife’s grandfathers stuff after he passed on. 1878u2l duhbiap
Gus 100% believes Grammy’s visits are for him. He’s not wrong 1872hv4 Fishmike52
My wife. Early 80s. 1878cqx CHhVCq
A six-planet solar system in perfect synchrony has been found in the Milky Way 1878k6t scottmache025
Mysterious woman tells school board that Scholastic book sparked porn addiction 18716k6 -Appleaday-
submission = r.submission(id='18770kx')
for top_level_comment in submission.comments[:5]:
print(top_level_comment.body)
> Political satire became obsolete when Henry Kissinger was awarded the Nobel peace prize.
-- Tom Lehrer on why he stopped performing political satire.
Quick someone update ishenrykissingerdead.com
Edit: as others have suggested: http://www.ishenrykissingerdeadyet.com/
This is Robert Evans’ 4th of July
I'm surprised they found his last horcrux.
I'm so glad Jimmy Carter got to outlive this vile piece of shit.
Using NLTK to extract info from comments#
Let’s use nltk
to extract some information from these comments. It will take a little while because we’re interacting in real time with the Reddit site for pulling these data.
Note
See here for info on the replace_more method.
import nltk
all_comments = ''
submission.comments.replace_more()
for top_level_comment in submission.comments:
all_comments += top_level_comment.body
all_comments_tokens = nltk.word_tokenize(all_comments)
all_comments_pos = nltk.pos_tag(all_comments_tokens)
all_comments_nouns = [word[0].lower() for word in all_comments_pos if word[1][:2] == 'NN']
print(nltk.FreqDist(all_comments_nouns).most_common(20))
[('kissinger', 95), ('hell', 86), ('henry', 67), ('war', 56), ('’', 51), ('*', 46), ('people', 45), ('death', 41), ('cambodia', 40), ('years', 36), ('world', 36), ('man', 31), ('day', 31), (']', 30), ('https', 29), ('t', 29), ('time', 28), ('piece', 26), ('riddance', 26), ('shit', 24)]
Accessing upvotes#
Let’s find out how many upvotes it received:
print('This submission has: ' + str(submission.score) + ' upvotes!')
This submission has: 29087 upvotes!
What about downvotes? PRAW does not have a downvote attribute you can look up. It only has a downvote method that lets you actually downvote the submission. So don’t do that unless you want to downvote it! To find the number of downvotes, we need to calculate it based on the score (number of upvotes) and the upvote ratio:
print(f'This submission has: {str(round(submission.score * (1-submission.upvote_ratio)))} downvotes!')
This submission has: 1745 downvotes!
Demo of accessing most frequent nouns used in a subreddit’s posts#
Let’s try something else. Let’s take a look at what folks on r/MadisonWI are discussing these days.
all_hot_submissions = ''
for cur_submission in r.subreddit("madisonwi").hot():
submission = r.submission(cur_submission)
all_hot_submissions += submission.title
all_hot_submissions_tokens = nltk.word_tokenize(all_hot_submissions)
all_hot_submissions_pos = nltk.pos_tag(all_hot_submissions_tokens)
all_hot_submissions_nouns = [word[0].lower() for word in all_hot_submissions_pos if word[1][:2] == 'NN']
print(nltk.FreqDist(all_hot_submissions_nouns).most_common(15))
[('wisconsin', 6), ('’', 6), ('december', 4), ('uw', 4), ('madison', 4), ('area', 4), ('year', 3), ('s', 3), ('coffee', 3), ('jobs', 2), ('students', 2), ('system', 2), ('christmas', 2), ('job', 2), ('lake', 2)]
Coffee, jobs, and Christmas…
Accessing user info of a “hot” post#
for cur_submission in r.subreddit("madisonWI").hot(limit = 2):
#accessing the 2nd one because the first is a Bot post
hottest_submission_in_madisonWI = r.submission(cur_submission)
print(f'The hottest submission in r/madisonWI is: {hottest_submission_in_wisconsin.title} by {hottest_submission_in_wisconsin.author}')
The hottest submission in r/madisonWI is: so are there any Kissinger parties tonight by Kind-Neighborhood512
Can we find out more about this user? What’s this person’s karma? Where else has this user posted within the last month?
import time
user_instance = hottest_submission_in_madisonWI.author
print('this user has ' + str(user_instance.comment_karma) + ' karma!')
all_subreddits_user_posted = []
for cur_submission in user_instance.submissions.new():
submission = r.submission(cur_submission)
all_subreddits_user_posted.append(submission.subreddit)
if time.time() - submission.created_utc > 60*60*24*30: #we'll walk through what's going on here in class
break
unique_subreddits = set(all_subreddits_user_posted)
for cur_subreddit in unique_subreddits:
print('This user has posted in ' + str(cur_subreddit.display_name) + ' in the past month.')
this user has 1418 karma!
This user has posted in madisonwi in the past month.
This user has posted in HomeDepot in the past month.
This user has posted in shorewoodhills in the past month.
This user has posted in StoughtonRoad in the past month.
Again with additional commentary#
Let’s unpack the above code a bit.
Let’s grab the top posts from MadisonWI (the top post here tends to be from the moderator, so we’ll go with the second highest)
Notice something a little peculiar here. If we print top_submission_in_madison.author
, it prints as a string. But if we look at it, we see it’s not a string at all:
type(hottest_submission_in_madisonWI.author)
praw.models.reddit.redditor.Redditor
We can see that it’s an object in the Redditor class. We can find out which precise methods it has by using help(top_submission_in_madison.author)
but the eaier way to go is just to look at the documentation. Searching google for praw.models.reddit.redditor.Redditor
got me here
Now let’s find this user’s previous 15 posts:
for num_submission,cur_submission in enumerate(user_instance.submissions.new()): #now iterate through the user's submissions
submission = r.submission(cur_submission) # create a submission instance
print(datetime.fromtimestamp(submission.created_utc), submission.subreddit, '||', submission.score, '||', submission.title,'\n') #access the submission's attributes:
if num_submission > 15: #stop after 15 submissions
break
2023-11-29 21:05:16 madisonwi || 122 || so are there any Kissinger parties tonight
2023-11-27 23:08:59 madisonwi || 18 || what if we replace the hairball intersection with a big fucking roundabout with a Culver's in the middle?
2023-11-27 12:12:14 StoughtonRoad || 2 || Feedback on areas to live in?
2023-11-27 12:11:57 shorewoodhills || 1 || Feedback on areas to live in?
2023-11-26 21:20:33 madisonwi || 0 || fireworks in Vanchamasshe neighborhood - wtf is going on?
2023-11-23 22:13:31 madisonwi || 0 || is there anywhere open that I can get a souffle
2023-11-21 10:53:18 madisonwi || 178 || why settle for the closest parking spot when you can have the two closest parking spots?
2023-11-20 20:23:36 madisonwi || 0 || HyVee
2023-11-17 21:37:52 madisonwi || 0 || does anyone else think Maple Bluff is lackluster?
2023-11-14 22:55:25 HomeDepot || 30 || what should I do if I see someone huffing paint at Home Depot?
2023-11-13 20:33:04 madisonwi || 0 || Denny's let me down so bad
2023-11-10 18:54:22 madisonwi || 98 || Madison's Greatest Parking Spots: Episode 1
2023-11-08 19:39:27 madisonwi || 51 || if you could ban one thing in Madison, what would you choose?
2023-11-06 21:59:50 madisonwi || 0 || OPINION: Letters aren't working out for bus routes at all. Time for Madison Metro to go back to numbers.
2023-11-03 22:04:16 madisonwi || 0 || bufo bufo bufo?
2023-10-30 20:55:42 madisonwi || 0 || Pipe organ players, where do you go to get your instrument serviced?
2023-10-29 09:20:11 madisonwi || 0 || Hello I am hiring for a shitty job with tons of mandatory overtime, please shower me with upvotes
A bad joke#
submission = r.submission('yshook') #the argument is the submission's id
print(submission.title)
print(submission.selftext)
What's the difference between grey and gray?
One is a color, and the other is a colour.