21  Homework: Randomized Experiments

Summary

In this homework, we’ll work with the potential outcomes framework for causal inference. We’ll start with a small example to build intuition, then use the Michigan social pressure experiment to see how randomization works in practice.

We’ll use the tidyverse.

library(tidyverse)

Potential Outcomes

In our lecture on randomized experiments, we introduced the potential outcomes framework. Each person \(j\) in a population has two potential outcomes:

  • \(y_j(1)\) is the outcome they would have if treated
  • \(y_j(0)\) is the outcome they would have if untreated

The individual treatment effect is \(\tau_j = y_j(1) - y_j(0)\), and the average treatment effect is \[ \bar\tau = \frac{1}{m}\sum_{j=1}^m \tau_j = \frac{1}{m}\sum_{j=1}^m \{y_j(1) - y_j(0)\}. \]

A Tiny Example

Consider this population of 6 people.

tiny.pop = data.frame(
  j = 1:6,
  y1 = c(6, 0, 4, 7, 8, 2),
  y0 = c(2, 0, 1, 7, 4, 0)
)
tiny.pop$tau = tiny.pop$y1 - tiny.pop$y0
tiny.pop

Exercise 23.1  

Calculate the average treatment effect \(\bar\tau\) for this population. Show your work.

🔒

Locked (Week 10)

Exercise 23.2  

Suppose we assign treatments \(w_1 \ldots w_6 = (1, 0, 1, 0, 1, 0)\). That is, persons 1, 3, and 5 are treated and persons 2, 4, and 6 are untreated.

  1. What realized outcome \(y_j(w_j)\) do we observe for each person?
  2. Calculate the difference in group means: \(\hat\tau = \frac{1}{3}\sum_{j:w_j=1} y_j(w_j) - \frac{1}{3}\sum_{j:w_j=0} y_j(w_j)\).
  3. Is \(\hat\tau\) equal to \(\bar\tau\)? Why or why not?
🔒

Locked (Week 10)

Exercise 23.3  

Now suppose we assign treatments \(w_1 \ldots w_6 = (0, 1, 0, 1, 0, 1)\). That is, persons 2, 4, and 6 are treated and persons 1, 3, and 5 are untreated.

Calculate \(\hat\tau\) for this assignment. How does it compare to the true ATE?

🔒

Locked (Week 10)

Randomization

The key insight is that if we randomize treatment assignment, our estimator \(\hat\tau\) is unbiased for the true ATE \(\bar\tau\). Different randomizations give different estimates, but on average they’re right.

Exercise 24.1  

There are \(\binom{6}{3} = 20\) ways to choose 3 people out of 6 to treat. For each of these 20 treatment assignments:

  1. Calculate \(\hat\tau\).
  2. Calculate the mean of all 20 values of \(\hat\tau\).
  3. Compare this mean to \(\bar\tau\). What do you notice?

Hint. You can use combn(6, 3) in R to generate all ways to choose 3 items from 6.

🔒

Locked (Week 10)

The Michigan Social Pressure Experiment

Now let’s work with real data. In 2006, Gerber, Green, and Larimer ran an experiment in Michigan to see what messages would increase voter turnout. They randomly assigned households to receive different letters.

social.pressure = read.csv('https://qtm285-1.github.io/assets/data/social-pressure-data.csv')

We’ll focus on two groups: the control group (no letter) and the “Neighbors” group (a letter showing their neighbors’ voting records).

michigan = social.pressure |>
  filter(treatment %in% c(' Control', ' Neighbors')) |>
  mutate(w = as.numeric(treatment == ' Neighbors'),
         y = as.numeric(voted == 'Yes'),
         age = 2006 - yob) |>
  select(w, y, age)

head(michigan)

Exercise 25.1  

Calculate the difference in voting rates between the Neighbors group and the Control group. This is our estimate of the average treatment effect of receiving the “Neighbors” letter.

🔒

Locked (Week 10)

Exercise 25.2  

How many people are in each group? Let \(n_1\) be the number treated (Neighbors) and \(n_0\) be the number in Control.

🔒

Locked (Week 10)

Simulating Randomization Variability

Because treatment was randomized, we can think about what would have happened under different randomizations. Let’s simulate this.

Exercise 25.3  

The code below re-randomizes treatment 1000 times, keeping the same number of treated and control units as in the actual experiment. For each re-randomization, it calculates \(\hat\tau\).

set.seed(1)
n = nrow(michigan)
n1 = sum(michigan$w == 1)

tau.hats = replicate(1000, {
  # Randomly assign n1 people to treatment
  w.star = rep(0, n)
  w.star[sample(1:n, n1)] = 1
  
  # Calculate treatment effect estimate
  mean(michigan$y[w.star == 1]) - mean(michigan$y[w.star == 0])
})
  1. Plot a histogram of these 1000 estimates.
  2. What is the mean of this distribution?
  3. What is the standard deviation?
  4. Add a vertical line at the actual estimate from the real experiment. Where does it fall in the distribution?
🔒

Locked (Week 10)

Standard Errors

In lecture, we showed that the variance of \(\hat\tau\) under randomization is approximately \[ \mathop{\mathrm{\mathop{\mathrm{V}}}}(\hat\tau) \approx \frac{\sigma^2(1)}{n_1} + \frac{\sigma^2(0)}{n_0} \] where \(\sigma^2(w)\) is the variance of potential outcomes under treatment \(w\).

Since we can’t observe both potential outcomes for anyone, we estimate this using the observed variances in each group: \[ \widehat{\mathop{\mathrm{\mathop{\mathrm{V}}}}}(\hat\tau) = \frac{\hat\sigma^2(1)}{n_1} + \frac{\hat\sigma^2(0)}{n_0} \]

Exercise 25.4  

  1. Calculate \(\hat\sigma^2(1)\) and \(\hat\sigma^2(0)\), the sample variances in the treated and control groups.
  2. Calculate the estimated standard error \(\widehat{SE}(\hat\tau) = \sqrt{\widehat{\mathop{\mathrm{\mathop{\mathrm{V}}}}}(\hat\tau)}\).
  3. Compare this to the standard deviation of your simulated \(\hat\tau\) values from the previous exercise.
🔒

Locked (Week 10)

Exercise 25.5  

Construct a 95% confidence interval for the average treatment effect using the normal approximation: \(\hat\tau \pm 1.96 \times \widehat{SE}(\hat\tau)\).

Interpret this interval in words.

🔒

Locked (Week 10)

Checking Coverage with Fake Data

We’ve constructed a confidence interval, but how do we know it actually has 95% coverage? We can check using fake data where we know the true ATE.

Creating a Fake Population

We’ll create a fake population of potential outcomes based on the Michigan data. The idea is to make up \(y_j(0)\) and \(y_j(1)\) for each person, then repeatedly randomize and check whether our intervals cover the true ATE.

set.seed(1)

# Use the Michigan data to create a population of potential outcomes
# We'll pretend the observed outcomes are y(0) and add a treatment effect
m = nrow(michigan)
fake.pop = data.frame(
  y0 = michigan$y,
  y1 = michigan$y + rbinom(m, 1, 0.08)  # treatment adds ~8pp on average
)
fake.pop$y1 = pmin(fake.pop$y1, 1)  # cap at 1 (can't vote more than once)

# The true ATE in this fake population
true.ate = mean(fake.pop$y1 - fake.pop$y0)
true.ate
[1] 0.05508098

Exercise 26.1  

The code below simulates 1000 randomized experiments from this fake population. For each experiment, it:

  1. Randomly assigns half the population to treatment
  2. Calculates \(\hat\tau\)
  3. Calculates the standard error and constructs a 95% CI
  4. Checks whether the CI covers the true ATE
set.seed(1)
m = nrow(fake.pop)
n1 = m %/% 2  # half treated

covers = replicate(1000, {
  # Randomize treatment
  w = rep(0, m)
  w[sample(1:m, n1)] = 1
  
  # Realized outcomes
  y = ifelse(w == 1, fake.pop$y1, fake.pop$y0)
  
  # Point estimate
  tau.hat = mean(y[w == 1]) - mean(y[w == 0])
  
  # Standard error
  se = sqrt(var(y[w == 1]) / sum(w == 1) + var(y[w == 0]) / sum(w == 0))
  
  # Does 95% CI cover?
  tau.hat - 1.96 * se <= true.ate & true.ate <= tau.hat + 1.96 * se
})
  1. What fraction of the intervals cover the true ATE?
  2. Is this close to 95%? If not, why might there be a discrepancy?
🔒

Locked (Week 10)

Exercise 26.2  

Now let’s see what happens with a smaller experiment. Modify the simulation to use only \(n = 200\) people (sampled from the fake population) instead of the full population.

  1. What coverage do you get?
  2. How does the width of the confidence intervals compare to the full-population case?

Hint. You’ll need to sample 200 people from fake.pop before randomizing treatment.

🔒

Locked (Week 10)

Connection to Sampling

Exercise 27.1  

In the small-sample exercise above, there were two sources of randomness:

  1. Sampling: Which 200 people we selected from the population
  2. Randomization: Which of those 200 got assigned to treatment

Our standard error formula only accounts for randomization variability. Yet we still got ~95% coverage. In a sentence or two, explain why this worked out.

Hint. Think about what happens to the sample ATE (the ATE among the 200 people we sampled) vs the population ATE.

🔒

Locked (Week 10)