hypothesis_testing

Set up

To start, let’s load our essential packages for data manipulation and statistical inference. We’ll be using poliscidata::gss for our examples.

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.1     ✔ stringr   1.5.2
✔ ggplot2   4.0.0     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.1.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(modelsummary)
library(infer)
library(poliscidata)

Registered S3 method overwritten by 'gdata':
  method         from  
  reorder.factor gplots

Attaching package: 'poliscidata'

The following object is masked from 'package:infer':

    gss

The lecture first reviewed testing hypotheses about means and proportions before moving to the \(\chi^2\) test. We will follow that order here.

Hypothesis Testing for Means and Proportions

Much of political science research involves continuous variables (like income or ideological scores) or binary outcomes (like voting/not voting). The t-tests and proportion tests are our primary tools for these scenarios. We’ll use the GSS dataset for these examples.

One-Sample t-test: Testing a Single Mean

The one-sample t-test is used to test a specific prediction (\(H_a\)) about the value of a population mean (\(\mu\)) against the null hypothesis (\(H_0\)), which states the mean is equal to a null value (\(\mu_0\)).

Hypothesis Example: Is the average number of children for Americans surveyed in the GSS greater than \(2\)?

\(H_a: \mu < 2\)

\(H_0: \mu = 2\)

We use the children variable and the t.test() function, setting the hypothesized mean with mu and the directional hypothesis with alternative = “less”.

# 1. Prepare data (clean NAs)
gss_children <- poliscidata::gss |>
  select(id, children = childs) |>
  drop_na()

# 2. Calculate the mean
mean(gss_children$children)

[1] 1.891933

# 3. Perform the one-sample t-test
one_sample_t_test <- t.test(gss_children$children,
                            mu = 2,
                            alternative = "less",
                            var.equal = T)

one_sample_t_test


    One Sample t-test

data:  gss_children$children
t = -2.8651, df = 1970, p-value = 0.002106
alternative hypothesis: true mean is less than 2
95 percent confidence interval:
     -Inf 1.954003
sample estimates:
mean of x 
 1.891933

If the p-value is small (e.g., \(< 0.05\)), we reject the null hypothesis, concluding that the true mean is indeed greater than 2.

Two-Sample t-test: Comparing Two Means

The two-sample t-test determines whether there is a statistically significant difference between the means of two independent groups. The null hypothesis is always that there is no difference between the two means.

Hypothesis Example: Do Democrats have a smaller average number of children than Republicans?

\(H_a: \mu_{\text{Democrat}} < \mu_{\text{Republican}}\)

\(H_0: \mu_{\text{Democrat}} = \mu_{\text{Republican}}\)

We’ll use partyid_3 (Democrat vs. Republican) and children.

# 1. Filter the data for only Democrats and Republicans
gss_child_party <- poliscidata::gss |>
  filter(partyid_3 %in% c("Dem", "Rep")) |>
  select(childs, partyid_3) |>
  drop_na()

# 2. Visualize the income distribution by party
gss_child_party |>
  ggplot(aes(x = partyid_3, y = childs, fill = partyid_3)) +
  geom_boxplot(alpha = 0.7) +
  labs(title = "Number of Children by Major Party ID",
       x = "Party Identification",
       y = "Children",
       fill = "Party") +
  theme_minimal()

We run the two-sample test using the formula interface DV ~ IV.

# 3. Perform the two-sample t-test (comparing Republican vs. Democrat means)
two_sample_t_test <- t.test(childs ~ partyid_3,
                            data = gss_child_party,
                            # Testing if Democrat's mean is greater
                            alternative = "less",
                            var.equal = T) 

two_sample_t_test


    Two Sample t-test

data:  childs by partyid_3
t = -1.1835, df = 1138, p-value = 0.1184
alternative hypothesis: true difference in means between group Dem and group Rep is less than 0
95 percent confidence interval:
       -Inf 0.04604209
sample estimates:
mean in group Dem mean in group Rep 
         1.816619          1.934389

One-Sample Proportion Test: Testing a Single Proportion

This test is used for a specific prediction about a proportion (\(\pi\)) in a population, often when dealing with binary outcomes (e.g., support/oppose, vote/don’t vote).

Hypothesis Example: We hypothesize that the proportion of people who believe the federal government is spending too little on the environment is greater than 50% (\(\pi_0 = 0.5\)).

\(H_a: \pi > 0.5\)

\(H_0: \pi = 0.5\)

We use the natenvir variable, convert it into a binary outcome, and use prop.test().

# 1. Prepare the data: create a binary variable for "Too little"
gss_envir <- poliscidata::gss |>
  select(natenvir) |>
  drop_na() |>
  mutate(
    is_too_little = ifelse(natenvir == "Too little", 1, 0)
  )

# 2. Calculate the total successes (x) and sample size (n)
x_successes <- sum(gss_envir$is_too_little)
n_obs <- nrow(gss_envir)

# 3. Use prop.test() for the one-sample proportion test
prop_test_one_sample <- prop.test(x = x_successes,
                                  n = n_obs,
                                  p = 0.5, # Null hypothesized proportion
                                  alternative = "greater")

prop_test_one_sample


    1-sample proportions test with continuity correction

data:  x_successes out of n_obs, null probability 0.5
X-squared = 21.953, df = 1, p-value = 1.397e-06
alternative hypothesis: true p is greater than 0.5
95 percent confidence interval:
 0.5489139 1.0000000
sample estimates:
        p 
0.5756952

Two-Sample Proportion Test: Comparing Two Proportions

This test compares the proportions of “success” between two independent groups.

Hypothesis Example: Is the proportion of Democrats who believe the government is spending too little on the environment higher than the proportion of Republicans?

\(H_a: \pi_{\text{Democrat}} > \pi_{\text{Republican}}\)

\(H_0: \pi_{\text{Democrat}} = \pi_{\text{Republican}}\)

# 1. Filter and summarize successes/totals for the two groups
gss_envir_party_summary <- poliscidata::gss |>
  filter(partyid_3 %in% c("Dem", "Rep")) |>
  select(natenvir, partyid_3) |>
  drop_na() |>
  mutate(
    is_too_little = ifelse(natenvir == "Too little", 1, 0)
  ) |>
  group_by(partyid_3) |>
  summarise(
    successes = sum(is_too_little),
    n = n()
  ) |>
  ungroup()

# 2. Extract the counts of successes and sample sizes in the required format
x_counts <- gss_envir_party_summary$successes
n_samples <- gss_envir_party_summary$n

# 3. Use prop.test() for the two-sample proportion test
prop_test_two_sample <- prop.test(x = x_counts, # Vector of successes
                                  n = n_samples, # Vector of total sample sizes
                                  alternative = "greater")

prop_test_two_sample


    2-sample test for equality of proportions with continuity correction

data:  x_counts out of n_samples
X-squared = 47.319, df = 1, p-value = 3.016e-12
alternative hypothesis: greater
95 percent confidence interval:
 0.2283152 1.0000000
sample estimates:
   prop 1    prop 2 
0.6824513 0.3816425

Chi-Square Test for Independence (Two-Way Tables)

The second part of the lecture focused on the \(\chi^2\) test, which is used when we want to test for a relationship (independence) between two categorical variables.

Federal spending on parks and recreation

We will explore hypothesis testing across categorical variables by answering the question: is an individual’s party identification associated with their support for current levels of federal spending on parks and recreation? We will use data from the GSS, obtained using poliscidata::gss.

\(H_a\): Party identification and support for federal park spending are associated (dependent).

\(H_0\): Party identification and support for federal park spending are independent.

gss <- poliscidata::gss |> 
  # Select only the relevant columns
  select(id, partyid_3, natpark) |> 
  # Remove non-complete responses
  drop_na()

Calculating our observed counts

First, we need to look at our observed data. We will make a cross tab of the data using modelsummary::datasummary_crosstab().

datasummary_crosstab(natpark ~ partyid_3, data = gss,
                     statistic = 1 ~ 1 + N + Percent("col"))

natpark		Dem	Ind	Rep	All
Too little	N	239	251	104	594
	% col	35.1	33.6	24.2	32.0
About right	N	413	450	290	1153
	% col	60.7	60.2	67.6	62.1
Too much	N	28	46	35	109
	% col	4.1	6.2	8.2	5.9
All	N	680	747	429	1856
	% col	100.0	100.0	100.0	100.0

The GSS surveyed 1,856 individuals in 2012, asking them of their party identification and level of support for current federal spending on parks and recreation.

Calculating our expected counts

If the null hypothesis were true—that is, if party identification and support for federal park spending are independent—then the counts in each cell would reflect the overall row and column marginal probabilities.

To find the expected count for any given cell, we use the formula:

\[\text{Expected Count} = \frac{(\text{Row Total} \times \text{Column Total})}{\text{Grand Total}}\]

The Chi-Squared Statistic and P-Value

The \(\chi^2\) statistic is a measure of the total difference between the observed counts and the expected counts under the null hypothesis.

\[\chi^2 = \sum \frac{(\text{Observed} - \text{Expected})^2}{\text{Expected}}\]

We can calculate the \(\chi^2\) statistic for our observed data using chisq.test():

chi_sq <- chisq.test(gss$natpark, gss$partyid_3)
chi_sq


    Pearson's Chi-squared test

data:  gss$natpark and gss$partyid_3
X-squared = 20.964, df = 4, p-value = 0.000322

Since the p-value is extremely small, we reject the null hypothesis and conclude that there is a statistically significant association between party identification and support for federal park spending.

Concluding Summary

This lab has walked us through the full range of common hypothesis tests you’ll use in political science:

t-tests for comparing means of continuous variables.
Proportion tests for comparing proportions of binary variables.
Chi-squared tests for exploring the association between two categorical variables.

You are now equipped with the R skills to put the lecture’s theoretical concepts into practice. Remember, the core logic is the same across all these tests: Assume \(H_0\) is true, calculate a test statistic, and see how likely that statistic is to occur under the null distribution.