30 Inference for Adjusted Comparisons

Introduction

Where We Are

Last time we saw that “the income gap” isn’t one number. We computed several adjusted comparisons.

Estimator	Value	Averages Over
\(\hat\Delta_{\text{raw}}\)	-12k	Nothing (raw means)
\(\hat\Delta_1\)	-15k	Women’s education distribution
\(\hat\Delta_0\)	-15k	Men’s education distribution
\(\hat\Delta_{\text{all}}\)	-15k	Everyone’s education distribution

Today we’ll ask: how do we do inference for these estimators? How precise are they? And why are some harder to estimate than others?

The Setup

All of these estimators are linear combinations of within-group means \(\hat\mu(w,x)\). For example, the raw comparison is \[ \hat\Delta_{\text{raw}} = \bar Y_1 - \bar Y_0 = \sum_x P_{1,x} \hat\mu(1,x) - \sum_x P_{0,x} \hat\mu(0,x) \] where \(P_{w,x} = N_{w,x} / N_w\) is the proportion of group \(w\) at education level \(x\).

The adjusted comparisons have the same structure. \[ \hat\Delta_1 = \sum_x P_{1,x} \qty{\hat\mu(1,x) - \hat\mu(0,x)} \]

We can write all of them as \[ \hat\theta = \sum_{w,x} \hat\alpha(w,x) \hat\mu(w,x) \] for some coefficients \(\hat\alpha(w,x)\) that may be random (depending on sample proportions) or fixed.

Unbiasedness

The Question

When is \(\hat\theta = \sum_{w,x} \hat\alpha(w,x) \hat\mu(w,x)\) an unbiased estimator of \(\theta = \sum_{w,x} \alpha(w,x) \mu(w,x)\)?

The answer comes from two facts we already know.

Within-group sample means are unbiased for within-group population means: \(\mathop{\mathrm{E}}[\hat\mu(w,x) \mid N_{w,x}] = \mu(w,x)\).
Sample proportions are unbiased for population proportions: \(\mathop{\mathrm{E}}[P_{w,x}] = p_{w,x}\).

The Calculation

Using the law of iterated expectations, \[ \begin{aligned} \mathop{\mathrm{E}}\qty[\sum_{w,x} \hat\alpha(w,x) \hat\mu(w,x)] &= \sum_{w,x} \mathop{\mathrm{E}}\qty{\hat\alpha(w,x) \mathop{\mathrm{E}}[\hat\mu(w,x) \mid X_1 \ldots X_n]} \\ &= \sum_{w,x} \mathop{\mathrm{E}}\qty{\hat\alpha(w,x)} \mu(w,x) \end{aligned} \]

So our estimator is unbiased if \(\mathop{\mathrm{E}}[\hat\alpha(w,x)] = \alpha(w,x)\) for all \(w,x\).

For \(\hat\Delta_1\), the coefficients are \(\hat\alpha(1,x) = P_{1,x}\) and \(\hat\alpha(0,x) = -P_{1,x}\). Sample proportions are unbiased, so \(\hat\Delta_1\) is unbiased.

The same argument works for \(\hat\Delta_0\), \(\hat\Delta_{\text{all}}\), and \(\hat\Delta_{\text{raw}}\).

Variance

Why It Matters

Unbiasedness tells us our estimator is centered at the right place. But how spread out is it? That determines the width of our confidence intervals.

Figure 33.1: Bootstrap distribution of raw comparison

Figure 33.2: Bootstrap distribution of Δ₁

The bootstrap shows that \(\hat\Delta_1\) has a wider sampling distribution than \(\hat\Delta_{\text{raw}}\). Why?

A Variance Formula

For estimators of the form \(\hat\theta = \sum_{w,x} \hat\alpha(w,x) \hat\mu(w,x)\), the variance has two parts.

\[ \mathop{\mathrm{\mathop{\mathrm{V}}}}[\hat\theta] = \underbrace{\sum_{w,x} \sigma^2(w,x) \mathop{\mathrm{E}}\qty[\frac{\hat\alpha(w,x)^2}{N_{w,x}}]}_{\text{variance from estimating } \mu(w,x)} + \underbrace{\mathop{\mathrm{\mathop{\mathrm{V}}}}\qty[\sum_{w,x} \hat\alpha(w,x) \mu(w,x)]}_{\text{variance from random coefficients}} \]

The first term captures the uncertainty in estimating each within-group mean. The second captures the uncertainty from using random weights (sample proportions).

When the coefficients are fixed (not random), the second term is zero. When they’re random, it’s usually small compared to the first term.

The Key Insight

The first term is what matters most. \[ \mathop{\mathrm{\mathop{\mathrm{V}}}}[\hat\theta] \approx \sum_{w,x} \sigma^2(w,x) \frac{\hat\alpha(w,x)^2}{N_{w,x}} \]

This tells us something important: an estimator is hard to estimate precisely when it puts large weight on small subgroups.

If \(\hat\alpha(w,x)\) is large but \(N_{w,x}\) is small, that term contributes a lot to the variance.

Comparing Estimators

Let’s see how this plays out for our estimators.

For \(\hat\Delta_{\text{raw}}\), the variance is roughly \[ \mathop{\mathrm{\mathop{\mathrm{V}}}}[\hat\Delta_{\text{raw}}] \approx \frac{\sigma^2_1}{N_1} + \frac{\sigma^2_0}{N_0} \] where \(\sigma^2_w\) is the variance of income within group \(w\). This is the familiar two-sample formula.

For \(\hat\Delta_1\), the variance involves the within-education-level variances. \[ \mathop{\mathrm{\mathop{\mathrm{V}}}}[\hat\Delta_1] \approx \sum_x P_{1,x}^2 \qty{\frac{\sigma^2(1,x)}{N_{1,x}} + \frac{\sigma^2(0,x)}{N_{0,x}}} \]

The problem: if women (group 1) have many people at education level \(x\), then \(P_{1,x}\) is large. But if men have few people at that education level, \(N_{0,x}\) is small. The ratio \(P_{1,x}^2 / N_{0,x}\) can be large.

Covariate Shift Hurts Precision

When the two groups have different covariate distributions (covariate shift), the adjusted estimators become harder to estimate. You’re asking: what would women earn if they had men’s education? But there aren’t many men at the education levels where most women are.

This is the price of adjustment. You get a more meaningful comparison, but a noisier estimate.

Variance Estimation

We can estimate the variance using the same formula with sample quantities. \[ \widehat{\mathop{\mathrm{\mathop{\mathrm{V}}}}}[\hat\theta] = \sum_{w,x} \hat\sigma^2(w,x) \frac{\hat\alpha(w,x)^2}{N_{w,x}} \]

This gives us a standard error, which we can use for confidence intervals. \[ \hat\theta \pm 1.96 \times \widehat{\text{se}} \]

Or we can just use the bootstrap, which handles all of this automatically.

Confidence Intervals

Bootstrap Intervals

The intervals tell the story:

\(\hat\Delta_{\text{raw}}\): Women earn about -12k less than men. Relatively precise estimate.
\(\hat\Delta_1\): Comparing women to similarly-educated men, the gap is -15k. Wider interval.
\(\hat\Delta_0\): If women had men’s education distribution, the gap would be -15k. Similar precision to \(\hat\Delta_1\).
\(\hat\Delta_{\text{all}}\): The average within-education gap is -15k.

The Tradeoff

There’s a tradeoff between meaning and precision.

The raw comparison is easy to estimate but hard to interpret—it conflates the effect of sex with differences in education.

The adjusted comparisons are more meaningful but harder to estimate precisely—they require estimating means for subgroups that may be small.

When you report an adjusted comparison, you should report its uncertainty too. A more meaningful but very uncertain estimate may not be more useful than a less meaningful but precise one.

Summary

What We Learned

Adjusted comparisons are unbiased if sample proportions are unbiased for population proportions. They are.
Variance depends on subgroup sizes and weights. An estimator is imprecise when it puts large weight on small subgroups.
Covariate shift hurts precision. When groups have different covariate distributions, adjusted comparisons are harder to estimate.
There’s a tradeoff. More meaningful comparisons (adjusted) are often less precise than simpler ones (raw).

The Formula

For \(\hat\theta = \sum_{w,x} \hat\alpha(w,x) \hat\mu(w,x)\): \[ \mathop{\mathrm{\mathop{\mathrm{V}}}}[\hat\theta] \approx \sum_{w,x} \sigma^2(w,x) \frac{\hat\alpha(w,x)^2}{N_{w,x}} \]

This tells you where the variance is coming from. If one term dominates, that’s the bottleneck for precision.