8 Calibrating Interval Estimates using the Bootstrap

Review

Point and Interval Estimation

In last week, we talked about estimating population frequencies.
Our point estimates were frequencies in a random sample drawn from the population.
To characterize our uncertaintly, we used interval estimates.
- In particular, calibrated interval estimates. Confidence intervals.
- To get these intervals, we add ‘arms’ of equal length to our point estimate.
- We choose the arm length so that, in 95% of surveys done like ours, the estimation target is within them.

Calibration of Interval Estimates

We can do that using the sampling distribution of our point estimate.
We’d draw arms out from the sampling distribution’s until they span 95% of point estimates.
We can see that this is the right length by a shift of perspective.
- A point estimate can touch the mean with its arms iff¹ the mean can touch it with equally long arms.
- So 95% of intervals cover if and only if 95% of point estimates are within the mean’s arms.
In practice, we can’t do exactly that. We don’t know the sampling distribution.
But we can use an estimate of the sampling distribution in its place.
If it’s a good one, we’ll get approximately 95% coverage.

A Parametric Estimate of the Sampling Distribution

Our point estimate and its sampling distribution

Our estimate of its sampling distribution

Our estimate was based on knowledge of the parametric form of the sampling distribution.
When we sample with replacement, it’s binomial.
- It’s the distribution of the frequency of heads in 625 flips of a coin with probability $\theta$
- where $\theta$ is the ‘frequency of heads’ in the population.
- e.g. $\theta\approx 0.70$ is the proportion of registered voters who will vote in our turnout example.

\[ P_\theta(\hat\theta = t) = \binom{n}{nt} \theta^{nt} (1-\theta)^{n(1-t)} \qqtext{ is the probability of the sample frequency being $t$} \]

By parametric form, I mean a formula in terms of some parameters.
- That tells us what parameters we need to estimate to estimate the sampling distribution.
- In this case, the only (unknown) parameter is the $\theta$, the frequency of heads in the population.
- We plugged in the sample frequency, $\hat\theta$, to get our estimate of the sampling distribution.

\[ \hat P_\theta(\hat\theta = t) = \binom{n}{nt} \hat\theta^{nt} (1-\hat\theta)^{n(1-t)} \qqtext{ is our estimate of the same thing} \]

Sampling without Replacement

We can do the same thing when we sample without replacement.
We just have to use a different parametric form: the hypergeometric distribution.

\[ P_\theta(\hat\theta = t) = \binom{n}{nt} \frac{\{m(1-\theta)\}!}{\{m(1-\theta)-n(1-t)\}!} \times \frac{(m\theta)!}{(m\theta-nt)!} \times \frac{(m-n)!}{m!} \]

Again, the only unknown parameter is $\theta$. And we can plug in our point estimate to estimate this distribution.

\[ \hat P_\theta(\hat\theta = t) = \binom{n}{nt} \frac{\{m(1-\hat\theta)\}!}{\{m(1-\hat\theta)-n(1-t)\}!} \times \frac{(m\hat\theta)!}{(m\hat\theta-nt)!} \times \frac{(m-n)!}{m!} \]

An Exercise

We’re going to run a survey on Candy Preferences.
- I’ll draw a sample of size $n=6$, with replacement, from the people in the room.
- They’ll pick candy and write their selection into a sample table on the board.
- In particular, we’ll write out whether they chose Chocolate ($Y=1$) or sour candy ($Y=0$).
Then we’re going to estimate the sampling distribution in a new way.
- It’ll be a lot like calculating the actual sampling distribution.
- We’ll draw a sample of size $n$ with replacement, use it to calculate our estimator, and repeat.
What’s different is the population we’re drawing our sample from.
We don’t have responses from the whole population, so we’ll draw it from the closest thing we’ve got: the sample.
- We call this bootstrapping.
- The estimate of the sampling distribution we get is called the bootstrap sampling distribution.

I’ve drawn the sample we got from our survey on the board.
Here’s how you draw a sample from the bootstrap sampling distribution of our point estimator.
1. Roll your die 6 times to draw a sample of size $n=6$ from the sample.
2. Calculate the sample frequency. That’s one draw from the bootstrap sampling distribution. Write it down.
If we all do this 5 times, we’ll have a bunch of draws.
We’ll tally them up on the board and visualize the result as a histogram.
A histogram of draws from the bootstrap sampling distribution.

Simulating a Few More Draws

Discussion

If all has gone to plan, our histogram looks a lot like our binomial estimate of the sampling distribition.
- If we substitute in a computer tally of 10,000 draws from the bootstrap sampling distribution, we nail it.
- It looks like the bootstrap sampling distribution is the binomial estimate.
Q. It is. How do you know?

We know the distribution of the frequency of 1s in a sample of size $n$ drawn with replacement …
- … from a population $y_1\ldots y_m$ of binary responses with frequency $\theta$.
It’s $\text{Binomial}(n, \theta)$. To estimate it, we plug in $\hat\theta$, the frequency of 1s in our sample.
So our Binomial estimate is the distribution of the frequency of 1s in a sample of size $n$ drawn with replacement …
- … from ‘a population’ $Y_1 \ldots Y_n$ with frequency $\hat\theta$.
What’s a draw from the bootstrap sampling distribution?

Each bootstrap sample is the frequency of 1s in a sample of size $n$ drawn with replacement …
- … from ‘a population’ $Y_1 \ldots Y_n$ in which the frequency of 1s is $\hat\theta$, the frequency of 1s in our sample.
- Because that ‘population’ is our sample.

The Bootstrap

The Bootstrap Interpretation in Our Turnout Poll

The sampling distribution estimate we’ve used was $\text{Binomial}(n,\hat\theta)$ for $n=625$ and $\hat\theta \approx 0.68$.
- It’s the distribution of the proportion heads in 625 flips of a coin with probability $\hat\theta \approx 0.68$ of heads.
- where $\hat\theta \approx 0.68$ is the proportion of voters we’ve polled who will vote.
That is, it’s the sampling distribution of a ‘poll’ of the people in our sample, i.e.
- roll a 625-sided die 625 times
- call up the corresponding person in our sample
- and counting up the yeses we hear
Note that this is random because we’re drawing with replacement.
- Each time we run this poll, we call each person in our sample 0,1,2,… times
- And the number of times we call them is random.

If we plot our voters on a map, you can see the idea in visual terms.
- On the left, we have the population.
- In the middle, we have our sample. It’s drawn, with replacement, from the population.
- On the right, we have something else. A new sample drawn, with replacement, from the sample.
Each ‘call’ that a person receives increases the size of their dot: $\text{circle area} \propto \text{number of calls}$.
In the sample, even though it’s drawn with replacement, all dots are the same size.
- Because we draw from such a large population, nobody gets called twice.
In the bootstrap sample, dots vary in size.
- Because we draw $n$ people from a sample of size $n$, it’s almost impossible not to call somebody twice.

Bootstrapping

Before the election, we don’t observe the population. But we do observe the sample.
- And we can sample from the sample act as if it were the population.
- We’ll take repeated random samples of size 625, with replacement, from our sample of size 625.
We call these bootstrap samples and estimates based on them bootstrap estimates.
- The distribution of these estimates is called the bootstrap sampling distribution.
- If the sample is like the population, this should be like our estimator’s actual sampling distribution.

The Sample \[ \begin{array}{r|rrrr|r} i & 1 & 2 & \dots & 625 & \bar{Y}_{625} \\ Y_i & 1 & 1 & \dots & 1 & 0.68 \\ \end{array} \]

The Bootstrap Sample

\[ \begin{array}{r|rrrr|r} i & 1 & 2 & \dots & 625 & \bar{Y}_{625}^* \\ Y_i^* & 1 & 1 & \dots & 1 & 0.63 \\ \end{array} \]

The Population

\[ \begin{array}{r|rrrr|r} j & 1 & 2 & \dots & 7.23M & \bar{y}_{7.23M} \\ y_{j} & 1 & 1 & \dots & 1 & 0.70 \\ \end{array} \]

The ‘Bootstrap Population’ — The Sample \[ \begin{array}{r|rrrr|r} j & 1 & 2 & \dots & 625 & \bar{y}^*_{625} \\ y_j^* & 1 & 1 & \dots & 1 & 0.68 \\ \end{array} \]

Notation. We use stars to distinguish the bootstrap sample from our original sample—we write $Y_i^*$ and $\bar Y^*$.

The Bootstrap is Nonparametric

bootstrap.samples = array(dim=10000)
for(rr in 1:10000) {
    Y.star = Y[sample(1:n, n, replace=TRUE)]
    bootstrap.samples[rr] = mean(Y.star) 
}

We do not need to know the parametric form of our sampling distribution to use it.
All we do is re-run our poll acting as if our sample were the population.
- We can do this no matter what we’re estimating.
- So let’s try it out on some stuff where we don’t know the parametric form.

Comparing Black and Non-Black Turnout

\[ \small{ \begin{array}{r|rr|rr|r|rr|rrrrr} \text{call} & 1 & & 2 & & \dots & 625 & & & & & & \\ \text{question} & X_1 & Y_1 & X_2 & Y_2 & \dots & X_{625} & Y_{625} & \overline{X}_{625} & \overline{Y}_{625} &\frac{\sum_{i:X_i=0} Y_i}{\sum_{i:X_i=0} 1} & \frac{\sum_{i:X_i=1} Y_i}{\sum_{i:X_i=1} 1} & \text{difference} \\ \text{outcome} & \underset{\textcolor{gray}{x_{869369}}}{0} & \underset{\textcolor{gray}{y_{869369}}}{1} & \underset{\textcolor{gray}{x_{4428455}}}{1} & \underset{\textcolor{gray}{y_{4428455}}}{1} & \dots & \underset{\textcolor{gray}{x_{1268868}}}{0} & \underset{\textcolor{gray}{y_{1268868}}}{1} & 0.28 & 0.68 & 0.68 & 0.69 & 0.01 \\ \end{array} } \]

It’s useful to predict turnout among Black voters because they tend to vote differently than non-Black voters.
Let’s suppose that we’re interested in the difference in turnout between Black and non-Black voters.
This is a bit reductive, but it’s the beginning of the semester, so we’re keeping things simple.
We know from the voter file that 30% of registered voters are Black.
Let’s suppose we asked the people we polled if they were Black, recording their answer in a covariate $X_i$.
And we found that…
- 172 of the people we called (28%) were Black with a turnout rate of 0.69
- 453 of the people we called (72%) were non-Black with a turnout rate of 0.68
So we’d estimate the difference to be $0.69-0.68 \approx 0.01$.

After the election, we found that …
- Turnout among Black registered voters was 0.74
- Turnout among non-Black registered voters was 0.68.
The actual difference was $0.74-0.68 \approx 0.06$.
Depending on what you’re doing, that point estimate of 0.01 may or may not have been accurate enough.
It would’ve been nice to have a confidence interval to tell us what kind of precision we could expect.
For that, we’ll need to estimate the sampling distribution of this difference.

A Table of Imaginary Polls

$$ \[\begin{array}{r|rr|rr|r|rr|rrrr} \text{call} & 1 & & 2 & & \dots & 625 & & & & & & \\ \text{poll} & X_1 & Y_1 & X_2 & Y_2 & \dots & X_{625} & Y_{625} & \overline{X} & \overline{Y} &\frac{\sum_{i:X_i=0} Y_i}{\sum_{i:X_i=0} 1} & \frac{\sum_{i:X_i=1} Y_i}{\sum_{i:X_i=1} 1} & \text{difference} \\ \hline \color[RGB]{7,59,76}1 & \color[RGB]{7,59,76}0 & \color[RGB]{7,59,76}1 & \color[RGB]{7,59,76}1 & \color[RGB]{7,59,76}1 & \color[RGB]{7,59,76}\dots & \color[RGB]{7,59,76}0 & \color[RGB]{7,59,76}1 & \color[RGB]{7,59,76}0.28 & \color[RGB]{7,59,76}0.68 & \color[RGB]{7,59,76}0.68 & \color[RGB]{7,59,76}0.69 & \color[RGB]{7,59,76}0.01 \\ \color[RGB]{239,71,111}1 & \color[RGB]{239,71,111}1 & \color[RGB]{239,71,111}0 & \color[RGB]{239,71,111}0 & \color[RGB]{239,71,111}1 & \color[RGB]{239,71,111}\dots & \color[RGB]{239,71,111}0 & \color[RGB]{239,71,111}1 & \color[RGB]{239,71,111}0.29 & \color[RGB]{239,71,111}0.71 & \color[RGB]{239,71,111}0.72 & \color[RGB]{239,71,111}0.70 & \color[RGB]{239,71,111}-0.01 \\ \color[RGB]{17,138,178}1 & \color[RGB]{17,138,178}1 & \color[RGB]{17,138,178}1 & \color[RGB]{17,138,178}0 & \color[RGB]{17,138,178}1 & \color[RGB]{17,138,178}\dots & \color[RGB]{17,138,178}0 & \color[RGB]{17,138,178}1 & \color[RGB]{17,138,178}0.26 & \color[RGB]{17,138,178}0.70 & \color[RGB]{17,138,178}0.68 & \color[RGB]{17,138,178}0.76 & \color[RGB]{17,138,178}0.08 \\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots \\ \color[RGB]{6,214,160}1 & \color[RGB]{6,214,160}0 & \color[RGB]{6,214,160}1 & \color[RGB]{6,214,160}0 & \color[RGB]{6,214,160}1 & \color[RGB]{6,214,160}\dots & \color[RGB]{6,214,160}0 & \color[RGB]{6,214,160}1 & \color[RGB]{6,214,160}0.30 & \color[RGB]{6,214,160}0.71 & \color[RGB]{6,214,160}0.68 & \color[RGB]{6,214,160}0.79 & \color[RGB]{6,214,160}0.12 \\ \end{array}\]

The Sampling Distribution

difference.samples = array(dim=10000)
for(rr in 1:10000) {
    I = sample(1:m, n, replace=TRUE)
    X = x[I]
    Y = y[I]
    difference.samples[rr] = mean(Y[X==1]) - mean(Y[X==0])
}

Here’s what the sampling distribution of this difference in turnout looks like.
As before, if we can estimate it we can use that estimate to get a 95% confidence interval.
But, unlike before, we don’t really know the parametric form of its sampling distribution.
Or—at least—it’d be a pain to work it out. So we’ll use the bootstrap to estimate it.

Making a Table of Bootstrap Polls

Three Bootstrap Samples—The First ‘Call’

Three Bootstrap Samples—The First and Second ‘Call’

Three Bootstrap Samples—The First, Second, and Last ‘Call’

\[ \small{ \begin{array}{r|rr|rr|r|rr|rrrr} \text{call} & 1 & & 2 & & \dots & 625 & & & & & & \\ \text{poll} & X_1 & Y_1 & X_2 & Y_2 & \dots & X_{625} & Y_{625} & \overline{X} & \overline{Y} &\frac{\sum_{i:X_i=0} Y_i}{\sum_{i:X_i=0} 1} & \frac{\sum_{i:X_i=1} Y_i}{\sum_{i:X_i=1} 1} & \text{difference} \\ \hline \color[RGB]{7,59,76}1 & \color[RGB]{7,59,76}0 & \color[RGB]{7,59,76}1 & \color[RGB]{7,59,76}1 & \color[RGB]{7,59,76}1 & \color[RGB]{7,59,76}\dots & \color[RGB]{7,59,76}0 & \color[RGB]{7,59,76}1 & \color[RGB]{7,59,76}0.28 & \color[RGB]{7,59,76}0.68 & \color[RGB]{7,59,76}0.68 & 0.69 & \color[RGB]{7,59,76}0.01 \end{array} } \]

A Completed Table of Bootstrap Polls

\[ \small{ \begin{array}{r|rr|rr|r|rr|rrrr} \text{`call'} & 1 & & 2 & & \dots & 625 & & & & & & \\ \text{`poll'} & X_1^* & Y_1^* & X_2^* & Y_2^* & \dots & X^*_{625} & Y^*_{625} & \overline{X}^* & \overline{Y}^* &\frac{\sum_{i:X_i^*=0} Y_i^*}{\sum_{i:X_i^*=0} 1} & \frac{\sum_{i:X_i^*=1} Y_i^*}{\sum_{i:X_i^*=1} 1} & \text{difference} \\ \hline \color[RGB]{239,71,111}2 & \color[RGB]{239,71,111}X_{398} & \color[RGB]{239,71,111}Y_{398} & \color[RGB]{239,71,111}X_{129} & \color[RGB]{239,71,111}Y_{129} & \color[RGB]{239,71,111}\dots & \color[RGB]{239,71,111}X_{232} & \color[RGB]{239,71,111}Y_{232} & & & & & & \\ & \color[RGB]{239,71,111}1 & \color[RGB]{239,71,111}0 & \color[RGB]{239,71,111}1 & \color[RGB]{239,71,111}1 & \color[RGB]{239,71,111}\dots & \color[RGB]{239,71,111}0 & \color[RGB]{239,71,111}1 & \color[RGB]{239,71,111}0.29 & \color[RGB]{239,71,111}0.68 & \color[RGB]{239,71,111}0.69 & \color[RGB]{239,71,111}0.68 & \color[RGB]{239,71,111}-0.01 \\ \color[RGB]{17,138,178}2 & \color[RGB]{17,138,178}X_{293} & \color[RGB]{17,138,178}Y_{293} & \color[RGB]{17,138,178}X_{526} & \color[RGB]{17,138,178}Y_{526} & \color[RGB]{17,138,178}\dots & \color[RGB]{17,138,178}X_{578} & \color[RGB]{17,138,178}Y_{578} & & & & & & \\ & \color[RGB]{17,138,178}0 & \color[RGB]{17,138,178}1 & \color[RGB]{17,138,178}0 & \color[RGB]{17,138,178}1 & \color[RGB]{17,138,178}\dots & \color[RGB]{17,138,178}0 & \color[RGB]{17,138,178}1 & \color[RGB]{17,138,178}0.28 & \color[RGB]{17,138,178}0.65 & \color[RGB]{17,138,178}0.64 & \color[RGB]{17,138,178}0.67 & \color[RGB]{17,138,178}0.03 \\ \color[RGB]{6,214,160}1M & \color[RGB]{6,214,160}X_{281} & \color[RGB]{6,214,160}Y_{281} & \color[RGB]{6,214,160}X_{520} & \color[RGB]{6,214,160}Y_{520} & \color[RGB]{6,214,160}\dots & \color[RGB]{6,214,160}X_{363} & \color[RGB]{6,214,160}Y_{363} & & & & & & \\ & \color[RGB]{6,214,160}0 & \color[RGB]{6,214,160}0 & \color[RGB]{6,214,160}0 & \color[RGB]{6,214,160}1 & \color[RGB]{6,214,160}\dots & \color[RGB]{6,214,160}0 & \color[RGB]{6,214,160}1 & \color[RGB]{6,214,160}0.28 & \color[RGB]{6,214,160}0.68 & \color[RGB]{6,214,160}0.66 & \color[RGB]{6,214,160}0.71 & \color[RGB]{6,214,160}0.05 \\ \end{array} } \]

The Difference’s Bootstrap Sampling Distribution

difference.bootstrap.samples = array(dim=10000)
for(rr in 1:10000) {
    I = sample(1:n, n, replace=TRUE)
    Xstar = X[I]
    Ystar = Y[I]
    difference.bootstrap.samples[rr] = mean(Ystar[Xstar==1]) - mean(Ystar[Xstar==0])
}

It looks like it works in this case.
But we’re no longer able to argue that it should work the way we did before.
- For that, we took advantage of our knowledge of our estimator’s parametric form.
- And we don’t have that now.
We’ll get there. But we’ll need a few new tools we’ll develop in the coming weeks.
- Normal approximation—a parametric form for an approximation to our estimator’s sampling distribution.
- Techniques for variance calculation. This’ll help us understand the parameters that go into it.

if and only if↩︎