12  Bias and Coverage

Why Bias Matters

In the homework, you simulated coverage for two estimators: the sample mean \(\bar{Y}\) and a prior-based estimator \(\tilde{Y}_{100}\). The sample mean gave you intervals that covered roughly 95% of the time. The prior-based estimator covered less often—around 90%. And you saw from Figure 16.1 that estimator c, with bias equal to two standard deviations, would cover only about 50% of the time.

Why does bias hurt coverage? And can we predict exactly how much?

The answer comes down to geometry. When we build a 95% confidence interval, we’re centering it at our estimate and extending it \(\pm 1.96\) standard errors. If the estimator is unbiased, the estimate is centered (on average) at the target, so the interval is centered at the right place. If the estimator is biased, the estimate is centered somewhere else, and the interval misses the target more often than it should.

The Geometry of Coverage

Let’s think about this carefully. Suppose our estimator \(\hat\theta\) is approximately normal with mean \(\mathop{\mathrm{E}}[\hat\theta]\) and standard deviation \(\text{se}\). When we construct an interval \(\hat\theta \pm 1.96 \cdot \text{se}\), we’re covering a region that extends 1.96 standard deviations on either side of our estimate.

If the estimator is unbiased, \(\mathop{\mathrm{E}}[\hat\theta] = \theta\), and the interval is centered at the target. The coverage is 95% because that’s how the normal distribution works: 95% of the probability mass lies within 1.96 standard deviations of the mean.

But if the estimator is biased, \(\mathop{\mathrm{E}}[\hat\theta] = \theta + \text{bias}\), and the interval is centered at the wrong place. The target \(\theta\) is no longer at the center of the interval—it’s off to one side. And that means the interval will miss it more often.

How much more often? That depends on how big the bias is relative to the standard error.

The Formula

Here’s the key insight. If \(\hat\theta\) is approximately normal with mean \(\theta + \text{bias}\) and standard deviation \(\text{se}\), then the standardized estimator \[ Z = \frac{\hat\theta - (\theta + \text{bias})}{\text{se}} \] is approximately standard normal. The interval \(\hat\theta \pm 1.96 \cdot \text{se}\) covers \(\theta\) when \[ \abs{\hat\theta - \theta} \le 1.96 \cdot \text{se}. \] Rewriting in terms of \(Z\), \[ \abs{Z + \frac{\text{bias}}{\text{se}}} \le 1.96 \] which happens when \[ -1.96 - \frac{\text{bias}}{\text{se}} \le Z \le 1.96 - \frac{\text{bias}}{\text{se}}. \] The coverage is the probability that \(Z\) lands in this interval. Since \(Z\) is standard normal, this is \[ \text{coverage} = \Phi\qty(1.96 - \frac{\text{bias}}{\text{se}}) - \Phi\qty(-1.96 - \frac{\text{bias}}{\text{se}}) \] where \(\Phi\) is the standard normal CDF.

In R:

coverage = function(bias.over.se) { 
  pnorm(1.96 - bias.over.se) - pnorm(-1.96 - bias.over.se) 
}

Examples

Let’s see what this looks like. In each plot below, the purple curve is the standard normal density (the distribution of \(Z\)). The green line marks where the target \(\theta\) is in standardized units—that’s at \(-\text{bias}/\text{se}\). The yellow region shows the values of \(Z\) for which the interval covers \(\theta\).

Bias/SE = 1/2

When bias is half a standard error, coverage is about 92%.

When the bias is half a standard error, the interval shifts but still covers the target most of the time. Coverage is 92%.

Bias/SE = 1

When bias equals one standard error, coverage drops to about 83%.

When the bias equals the standard error, we’re starting to miss more often. Coverage is 83%.

Bias/SE = 2

When bias is two standard errors, coverage is about 50%.

When the bias is twice the standard error, the target is at the edge of where we’d expect the estimator to land. Coverage is 48%—a coin flip.

Bias/SE = 3

When bias is three standard errors, coverage is only about 17%.

When the bias is three standard errors, we almost never cover the target. Coverage is 15%.

The Full Curve

Here’s the complete relationship between bias/se and coverage.

Coverage as a function of bias/se. The dashed line marks 95% coverage.

The curve is symmetric because coverage depends on the magnitude of bias, not its sign. Some useful reference points:

bias/se coverage
0 95%
1/2 92%
1 83%
2 48%
3 15%

Back to the Prior-Based Estimator

Let’s apply this to the estimator from the homework. Recall that \(\tilde{Y}_{n^{\text{prior}}}\) has \[ \text{bias} = \frac{n^{\text{prior}}(\theta^{\text{prior}} - \theta)}{n^{\text{prior}} + n} \qqand \text{se} = \frac{\sqrt{n\theta(1-\theta)}}{n^{\text{prior}} + n}. \] The bias/se ratio is \[ \frac{\text{bias}}{\text{se}} = \frac{n^{\text{prior}}(\theta^{\text{prior}} - \theta)}{\sqrt{n\theta(1-\theta)}}. \]

Suppose we’re estimating a proportion \(\theta\) with sample size \(n=100\). We have a prior guess \(\theta^{\text{prior}} = 0.5\), and the true value turns out to be \(\theta = 0.6\).

  1. If we want coverage of at least 90%, what’s the maximum bias/se ratio we can tolerate? (Use the table above or the coverage function.)
  2. What’s the maximum number of prior observations \(n^{\text{prior}}\) we can use?
  3. What if we wanted coverage of at least 80%?
  1. From the table, coverage of 90% requires bias/se \(\lesssim 0.7\). (You can check: coverage(0.7) \(\approx\) 0.89.)

  2. We need \[ \frac{n^{\text{prior}} \cdot |0.5 - 0.6|}{\sqrt{100 \cdot 0.6 \cdot 0.4}} \le 0.7 \] which gives \[ n^{\text{prior}} \le \frac{0.7 \cdot \sqrt{24}}{0.1} = 0.7 \cdot \sqrt{2400} \approx 34. \] So we can use at most about 34 prior observations before coverage drops below 90%.

  3. For 80% coverage, we can tolerate bias/se \(\approx 1.1\). This gives \(n^{\text{prior}} \le 1.1 \cdot \sqrt{2400} \approx 54\).

Try it yourself:

The exercise shows that the “safe” number of prior observations depends on how wrong our prior might be. If \(\theta^{\text{prior}}\) is close to \(\theta\), we can use more. If it’s far off, we need to be more conservative.

A Cautionary Tale

Here’s something counterintuitive: larger samples can make bias problems worse.

Suppose you have a biased estimator with some fixed amount of bias. In a small study, the standard error is large, so the bias/se ratio is small, and coverage is close to 95%. But as you collect more data, the standard error shrinks (typically like \(1/\sqrt{n}\)), while the bias stays the same. The bias/se ratio grows, and coverage gets worse.

Example. Suppose your estimator has bias = 0.02 and se = \(0.04/\sqrt{n}\).

  • At \(n = 100\): se = 0.004, bias/se = 0.5, coverage \(\approx\) 92%
  • At \(n = 400\): se = 0.002, bias/se = 1, coverage \(\approx\) 83%
  • At \(n = 1600\): se = 0.001, bias/se = 2, coverage \(\approx\) 48%
  • At \(n = 6400\): se = 0.0005, bias/se = 4, coverage \(\approx\) 2%

This is why unbiasedness matters. A small amount of bias might be fine in a pilot study but catastrophic in a large one. The bias doesn’t go away as you collect more data—it just becomes more apparent.

What’s Next

Now that we know why bias matters, we want to make sure our estimators don’t have too much. In the next chapter, we’ll build up the probability tools we need to prove that estimators are unbiased—or to calculate how much bias they have. The sample mean is unbiased—we’ve already seen that. But we’ll want to verify this carefully, and we’ll need these tools for more complicated estimators later in the semester.