Exercise 7 - Exploring the General Social Survey#

Accept the exercise

This exercise will give you practice wrangling and analyzing the general social survey. The goals are to cement the dplyr skills we’ve been working on and to give you practice going from a description of a problem to figuring out how to get the data to give you answer – a critical skill for doing social / data science.

Please follow these instructions to download all the gss data and load it into R. I recommend the using drat to install. Notice that importing the data this way fixes the earlier issue we saw with numeric variables like age and income being imported as factors. So no need to do the as.character %>% as.numeric conversion. Be mindful of the factor levels because they don’t always correspond to what they ought to be. For example, variable called happy has its largest level corresponding to a least happy response. Rename your variables or use fct_rev as necessary so that the variable name matches its values. You can see that many variables have labels built alongside to that you know what each numeric response corresponds to.

If you want to access the label instead of the numeric response (e.g., “male” and “female” for sex instead of 1 and 2), load in the labelled package and use to_factor, e.g.

gss_all$sex_l <- to_factor(gss_all$sex)

If you want to create factors that preserve the original order (you probably do):

gss_all$sex_l <- to_factor(gss_all$sex, levels="p")

sex_l is now a factor.

If you want to create an ordinal factor (you probably don’t), use ordered:

gss_all$sex_l <- to_factor(gss_all$sex, ordered=TRUE)

Please load in all the years into a dataframe called gss_all using the instructions here. You can complete the entire exercise using this dataframe, calling on individual years as needed.

Part 1: Human evolution vs. elephant evolution#

Recall from the GSS activity that while political orientation (polviews) was associated with endorsement of human evolution (evolved), it was not strongly associated with endorsement of the evolution of elephants (evolved2). Visualize this interaction in the clearest way possible. Because evolved2 was only asked in 2016 and 2018, use the (combined) 2016 and 2018 data for this graph (i.e., only base your analyses on the years 2016 and 2018).


Part 2: Basic science knowledge, politics, education, and endorsement of human evolution?#

Begin by creating a compound variable science_knowledge that contains the average correctness of the following variables:

earthsun, electron, lasers, condrift, radioact, hotcore

(you’ll need to consult the codebook if you’re not sure what the right answer is 😬)

Next, explore relationships between this variable (higher means more basic science knowledge), political affiliation (polview is a good one, but you can explore others), education (degree or educ for a more continuous measure), endorsement of human evolution (evolved)? Feel free to look at variables coding for people’s employment type as well. As part of your solution for this part, report one or more main effects, an interaction (the more interesting and unexpected, the better), and a visualization that helps you gain insight into a pattern that’s hard to appreciate by just looking at the numbers.

Tip

The functions rowwise and mean(c_across(list_of_columns)) (or rowMeans(across(list_of_columns)) will come in handy for creating the compound variable. See here for usage examples

Part 3: Work and Happiness#

As we discussed in class, happy contains people’s response to the question of how happy they are with life in general (the coding, confusingly, is 1=very happy, 2=pretty happy, 3=not too happy). Make sure to keep this coding straight when you interpret the data!

a. Explore the relationship between happy and wrkstat. What are you seeing? Show the key data (as a table or figure) that support your interpretation.

b. Break down the relationship between happy and wkrstat for men and women (sex)? Is it similar? Which level of wrkstat shows the largest sex difference?

c. Let’s now bring in another variable: spwrksta which codes the work status of the respondent’s spouse. Look at the combinations of sex, wkrstat and spwrkstat and get the mean happiness. Which categories tend to be more happy? Which tend to be less happy? Does this change for people with college degrees vs. not?

Note

The sex of the spouse is not indicated (until very recently gay marriage was, of course, illegal, and the question-asker would not want to incriminate the respondent). For a very large majority of the respondents the spouse will of the opposite sex.

For this last analysis, filter out categories with NAs and fewer than 100 responses (many categories get to be very small and it’s not worth trying to interpret them).

Part 4: Tax priorities#

Some people believe the government is spending too much money on certain issues. Other people believe we are not spending enough on those same issues. Explore the relationships between the following two sets of variables:

Problems:

natspac, natenvir, natheal, natcity, natcrime, natdrug, nateduc, natrace, natarms, nataid, natfare

Respondents’ characteristics (you may choose other variables than the ones below):

polviews, degree (or alternatively, educ), age, sex

Pick two variables from the problems category and one variable from the respondents’ characteristics category that the view for the two problems are flipped for the same group of respondents (e.g., liberals think we are spending too much money on military but too little money on education, while conservatives think the opposite).

a. Visualize the data in a way that clearly shows the pattern.

b. Investigate whether their views have changed over the years (year)

Tip

You can use geom_smooth() to fit a curve across years to see how beliefs change across time. Be wary of small sample sizes though. You might want to restrict your analysis to cases containing a minimum of, say, 50 respondents.

Tip

Make sure you interpret directionality (whether someone endorses or doesn’t endorse spending more money on an issue) in the right direction!

Part 5: Astrology: beliefs, hobbies, and personalities#

About a third of the GSS respondents think that astrology is at least somewhat scientific (astrosci). Let’s explore this. This question has an extra credit component (2 pts)

a. Do people who believe that astrology is scientific (astrosci) also more likely to read their horoscope (astrolgy)? If so, how much more likely?

b. What can you say about the demographics of people who read their horoscope (astrolgy) vs. don’t read it? Is there some dimension that is especially well associated with it? (gender? age? occupation? (see occ10, also indus10). Bonus point if you discover a correlation of >.3 (or equivalent effect size).

c. Is there a relationship between people’s zodiac sign (zodiac) and belief in astrology? Likelihood of reading their horoscope? (bit meta there…)

Note

Since many of these questions were only asked in some years, if you try to use data from multiple questions at once, you may end up with 0 respondents.

Note

Seeing how horoscopes do not in fact predict anything, it’s perfectly fine to not find anything that would vindicate their predictive power!

Bonus! - 1.5 pts. Imagine you are true believer (and an unscrupulous data scientist) trying to find data showing that there really is a relationship between one’s zodiac sign and life trajectory. What’s the strongest argument you can craft based on your analyses? Some variables you might find useful: socbar (going to bars), helpful, empathy1, empathy2sprtprsn (spirituality) – aren’t Scorpios supposed to be more mysterious or something? 🤷‍♂️. You might want to check out variables related to occupation (e.g., occ10, life events (divorce), hobbies (e.g., camping – note that 1993 wave had a bunch of questions related to hobbies). Bonus +1 for a compelling graph.