Exercise 7 - Exploring the General Social Survey
Contents
Exercise 7 - Exploring the General Social Survey#
This exercise will give you practice wrangling and analyzing the general social survey. The goals are to cement the dplyr
skills we’ve been working on and to give you practice going from a description of a problem to figuring out how to get the data to give you answer – a critical skill for doing social / data science.
Please follow these instructions to download all the gss data and load it into R. I recommend the using drat
to install. Notice that importing the data this way fixes the earlier issue we saw with numeric variables like age
and income
being imported as factors. So no need to do the as.character %>% as.numeric
conversion. Be mindful of the factor levels because they don’t always correspond to what they ought to be. For example, variable called happy
has its largest level corresponding to a least happy response. Rename your variables or use fct_rev
as necessary so that the variable name matches its values. You can see that many variables have labels built alongside to that you know what each numeric response corresponds to.
If you want to access the label instead of the numeric response (e.g., “male” and “female” for sex
instead of 1 and 2), load in the labelled package and use to_factor
, e.g.
gss_all$sex_l <- to_factor(gss_all$sex)
If you want to create factors that preserve the original order (you probably do):
gss_all$sex_l <- to_factor(gss_all$sex, levels="p")
sex_l
is now a factor.
If you want to create an ordinal factor (you probably don’t), use ordered
:
gss_all$sex_l <- to_factor(gss_all$sex, ordered=TRUE)
Please load in all the years into a dataframe called gss_all
using the instructions here. You can complete the entire exercise using this dataframe, calling on individual years as needed.
Part 1: Human evolution vs. elephant evolution#
Recall from the GSS activity that while political orientation (polviews
) was associated with endorsement of human evolution (evolved
), it was not strongly associated with endorsement of the evolution of elephants (evolved2
). Visualize this interaction in the clearest way possible. Because evolved2
was only asked in 2016 and 2018, use the (combined) 2016 and 2018 data for this graph (i.e., only base your analyses on the years 2016 and 2018).
Part 2: Basic science knowledge, politics, education, and endorsement of human evolution?#
Begin by creating a compound variable science_knowledge
that contains the average correctness of the following variables:
earthsun, electron, lasers, condrift, radioact, hotcore
(you’ll need to consult the codebook if you’re not sure what the right answer is 😬)
Next, explore relationships between this variable (higher means more basic science knowledge), political affiliation (polview
is a good one, but you can explore others), education (degree
or educ
for a more continuous measure), endorsement of human evolution (evolved
)? Feel free to look at variables coding for people’s employment type as well. As part of your solution for this part, report one or more main effects, an interaction (the more interesting and unexpected, the better), and a visualization that helps you gain insight into a pattern that’s hard to appreciate by just looking at the numbers.
Tip
The functions rowwise
and mean(c_across(list_of_columns))
(or rowMeans(across(list_of_columns))
will come in handy for creating the compound variable. See here for usage examples
Part 3: Work and Happiness#
As we discussed in class, happy
contains people’s response to the question of how happy they are with life in general (the coding, confusingly, is 1=very happy, 2=pretty happy, 3=not too happy). Make sure to keep this coding straight when you interpret the data!
a. Explore the relationship between happy
and wrkstat
. What are you seeing? Show the key data (as a table or figure) that support your interpretation.
b. Break down the relationship between happy
and wkrstat
for men and women (sex
)? Is it similar? Which level of wrkstat
shows the largest sex difference?
c. Let’s now bring in another variable: spwrksta
which codes the work status of the respondent’s spouse. Look at the combinations of sex
, wkrstat
and spwrkstat
and get the mean happiness. Which categories tend to be more happy? Which tend to be less happy? Does this change for people with college degrees vs. not?
Note
The sex of the spouse is not indicated (until very recently gay marriage was, of course, illegal, and the question-asker would not want to incriminate the respondent). For a very large majority of the respondents the spouse will of the opposite sex.
For this last analysis, filter out categories with NAs and fewer than 100 responses (many categories get to be very small and it’s not worth trying to interpret them).
Part 4: Tax priorities#
Some people believe the government is spending too much money on certain issues. Other people believe we are not spending enough on those same issues. Explore the relationships between the following two sets of variables:
Problems:
natspac, natenvir, natheal, natcity, natcrime, natdrug, nateduc, natrace, natarms, nataid, natfare
Respondents’ characteristics (you may choose other variables than the ones below):
polviews, degree (or alternatively, educ), age, sex
Pick two variables from the problems category and one variable from the respondents’ characteristics category that the view for the two problems are flipped for the same group of respondents (e.g., liberals think we are spending too much money on military but too little money on education, while conservatives think the opposite).
a. Visualize the data in a way that clearly shows the pattern.
b. Investigate whether their views have changed over the years (year
)
Tip
You can use geom_smooth() to fit a curve across years to see how beliefs change across time. Be wary of small sample sizes though. You might want to restrict your analysis to cases containing a minimum of, say, 50 respondents.
Tip
Make sure you interpret directionality (whether someone endorses or doesn’t endorse spending more money on an issue) in the right direction!
Part 5: Astrology: beliefs, hobbies, and personalities#
About a third of the GSS respondents think that astrology is at least somewhat scientific (astrosci
). Let’s explore this. This question has an extra credit component (2 pts)
a. Do people who believe that astrology is scientific (astrosci
) also more likely to read their horoscope (astrolgy
)? If so, how much more likely?
b. What can you say about the demographics of people who read their horoscope (astrolgy
) vs. don’t read it? Is there some dimension that is especially well associated with it? (gender? age? occupation? (see occ10
, also indus10
). Bonus point if you discover a correlation of >.3 (or equivalent effect size).
c. Is there a relationship between people’s zodiac sign (zodiac
) and belief in astrology? Likelihood of reading their horoscope? (bit meta there…)
Note
Since many of these questions were only asked in some years, if you try to use data from multiple questions at once, you may end up with 0 respondents.
Note
Seeing how horoscopes do not in fact predict anything, it’s perfectly fine to not find anything that would vindicate their predictive power!
Bonus! - 1.5 pts. Imagine you are true believer (and an unscrupulous data scientist) trying to find data showing that there really is a relationship between one’s zodiac sign and life trajectory. What’s the strongest argument you can craft based on your analyses? Some variables you might find useful: socbar
(going to bars), helpful
, empathy1
, empathy2
… sprtprsn
(spirituality) – aren’t Scorpios supposed to be more mysterious or something? 🤷♂️. You might want to check out variables related to occupation (e.g., occ10
, life events (divorce
), hobbies (e.g., camping
– note that 1993 wave had a bunch of questions related to hobbies). Bonus +1 for a compelling graph.