19 Homework: Variance of Comparisons

Summary

This week’s homework addresses two issues we’ve left hanging.

1. Why We Usually Get Near-Perfect Calibration
2. Variance Calculation for Comparisons

Three sampling distributions and the interval spanning their middle 95%

I know what you’re thinking: we had a whole homework assignment on this. But that’s only half-right. In the Week 2 Homework, we focused on showing that we get near-perfect calibration, not really why. But using normal approximation, we can do the why part pretty easily. This’ll be quick, and it’ll involve a bit of calculus, which makes it a good warm-up for what we’ll do next. It’ll give us an opportunity to revisit some of our Week 2 stuff using a more formula-driven perspective, too.

The difference’s bootstrap sampling distribution

I said, in our Lecture on Comparing Two Groups, that you’d be calculating the variance of a difference in subsample means in this one. You’ll be doing that and a little more: you’ll be calculating,approximately, the variance of a ratio of subsample means, too. Because sometimes that’s closer to what you want to know. People often say, for example, that women in this country earn 78 cents on the dollar for doing the same work as men. That’s a ratio. Tackling this in addition to the difference won’t be too much additional work. After a little bit of calculus, we basically wind up in the same place as we do for the difference.

Calculus Review: Linear Approximation

We’re going to be using linear approximation to simplify some of our calculations. Given a function \(f(x)\), we can approximate it near any point \(x_0\) like this. \[ f(x) \approx f(x_0) + f'(x_0)(x-x_0) \]

Hopefully you remember that from calculus. If you like, you can call this first-order Taylor approximation. And there are a few formulas for the error of this approximation, which is called the remainder in Taylor’s Theorem, in most calculus textbooks.

When we’re thinking about functions of multiple variables, we use the multivariate version, which involves partial derivatives. \[ \begin{aligned} f(x,y) &\approx f(x_0,y_0) + \qty[\frac{\partial f}{\partial x}(x_0,y_0)] (x-x_0) \ + \ \qty[\frac{\partial f}{\partial y}(x_0,y_0)] (y-y_0). \end{aligned} \]

Why We Usually Get Near-Perfect Calibration

Suppose we’ve sampled with replacement from a binary population in which \(\theta\) is the proportion of ones. If we use the sample mean \(\hat\theta\) as our point estimate and calibrate a 95% confidence interval around it using normal approximation, this is the interval we get.

\[ \hat\theta \pm 1.96 \hat\sigma / \sqrt{n} \qfor \hat\sigma^2 = \hat\theta(1-\hat\theta) \]

On the other hand, the interval we’d want—assuming we’re still happy to use normal approximation—uses the actual variance of our sample proportion, \(\sigma^2=\theta(1-\theta)\), instead of the estimate \(\hat\sigma^2\). Figure 20.1 is an attempt to convince ourselves that it doesn’t make much of a difference at all. I think I called the difference ‘a fingernail thick’ in lecture. Now you’re going to quantify this difference. We’ll assume that \(\hat\theta\) is one of the ‘good draws’ from its sampling distribution, which for our purposes will mean that it’s in the interval \(\theta \pm 1.96\sigma / \sqrt{n}\), the middle 95% of the sampling distribution’s normal approximation.

Exercise 22.1

Suppose \(\hat\theta \in \theta \pm 1.96\sigma / \sqrt{n}\) for \(\sigma^2=\theta(1-\theta)\). Find an approximate upper bound on the difference \(|\hat w - w|\) between the estimated interval width \(\hat w = 2 \times 1.96 \hat\sigma / \sqrt{n}\) and the ideal-but-unusable interval width \(w= 2 \times 1.96 \sigma / \sqrt{n}\). What fraction of the ideal width \(w\) is your bound?

Your bound, both in absolute terms and as a fraction of \(w\), should be a function of \(\theta\) and \(n\).

You’ll probably want to use linear approximation to do this. \[ \hat\sigma - \sigma = f(\hat\theta) - f(\theta) \approx f'(\theta)(\hat\theta - \theta) \qfor f(x) = \sqrt{x(1-x)}. \]

🔒

Solution

Locked (Week 5)

Exercise 22.2

Are there values of \(\theta\) where this difference \(\hat w - w\) is a large fraction of the ideal width \(w\)? If so, how large? Use Figure 22.1 to explain what’s going on in intuitive terms.

🔒

Solution

Locked (Week 5)

Variance Calculation for Comparisons

Differences in Means

Figure 23.1: 1978 income for participants in the National Supported Work Demonstration.

In our Lecture on Comparing Two Groups, we talked about how to use subsample means to compare two groups. In particular, we talked about the case that we’ve drawn a sample \((X_1,Y_1) \ldots (X_n,Y_n)\) with replacement from a population \((x_1,y_1) \ldots (x_m,y_m)\) in which \(x_j \in \{0,1\}\) indicates membership in one of two groups, e.g. treated and control groups in Figure 23.1. And we talked about using the difference \(\textcolor[RGB]{0,191,196}{\hat\mu(1)}-\textcolor[RGB]{248,118,109}{\hat\mu(0)}\) in the mean of \(Y_i\) for the subsamples in which \(\textcolor[RGB]{0,191,196}{X_i=1}\) and \(\textcolor[RGB]{248,118,109}{X_i=0}\) to estimate the corresponding difference \(\textcolor[RGB]{0,191,196}{\mu(1)}-\textcolor[RGB]{248,118,109}{\mu(0)}\) in the population.

Exercise 23.1

We showed that an individual subsample mean \(\hat\mu(x)\) is an unbiased estimator of the corresponding population mean \(\mu(x)\). That implies that the difference of two such estimates, \(\hat\mu(1)-\hat\mu(0)\), is an unbiased estimator of the difference of the corresponding population means, \(\mu(1)-\mu(0)\).

Explain why the first implies the second. A sentence or even a couple words should do.

🔒

Solution

Locked (Week 5)

We also calculated a formula for the variance of a subsample mean \(\hat\mu(x)\). \[ \mathop{\mathrm{\mathop{\mathrm{V}}}}[\hat\mu(x)] = \frac{\sigma^2(x)}{N_x} \text{ for } N_x = \sum_{i}1_{=x}(X_i) \qand \sigma^2(x) = \mathop{\mathrm{\mathop{\mathrm{V}}}}[Y_i \mid X_i=x] \]

And I stated without proof a formula for the variance of the difference of two subsample means. \[ \mathop{\mathrm{\mathop{\mathrm{V}}}}\qty[\hat{\mu}(1)-\hat{\mu}(0)] = \mathop{\mathrm{E}}\qty[\frac{1}{N_1}\sigma^2(1)+\frac{1}{N_0}\sigma^2(0)] \text{ for } N_x = \sum_{i}1_{=x}(X_i) \]

It’s a simple formula. The variance of the difference in means is the sum of the variances of the two means. Why is that the case? To start to see why, we can start from definitions and do a bit of arithmetic.

\[ \begin{aligned} \mathop{\mathrm{\mathop{\mathrm{V}}}}\qty[\hat{\mu}(1)-\hat{\mu}(0)] &= \mathop{\mathrm{E}}\qty[ \qty(\{\hat{\mu}(1) - \hat{\mu}(0)\} - \{\mu(1)-\mu(0)\})^2 ] \\ &= \mathop{\mathrm{E}}\qty[ \qty(\{\hat{\mu}(1) - \mu(1)\} - \{\hat{\mu}(0) -\mu(0)\})^2 ] \\ &= \mathop{\mathrm{E}}\qty[ \qty(\{\hat{\mu}(1) - \mu(1)\})^2 ] + \mathop{\mathrm{E}}\qty[ \qty(\{\hat{\mu}(0) -\mu(0)\})^2 ] \\ &- 2\mathop{\mathrm{E}}\qty[ \{\hat{\mu}(1) - \mu(1)\}\{\hat{\mu}(0) -\mu(0)\}] \end{aligned} \tag{23.1}\]

The first two terms here are the ones that appear in our formula above: the variances of the two means. For that formula to be correct, the last term has to be zero. It’s up to you to prove that.

Exercise 23.2

Complete the argument by proving that the ‘cross term’ is zero, i.e., that \[ \mathop{\mathrm{E}}\qty[\{\hat{\mu}(1) - \mu(1)\}\{\hat{\mu}(0) -\mu(0)\}] = 0. \]

A Tip.

Look over the calculation we used for one subsample mean \(\hat{\mu}(x)\) here. You’ll want to use a lot of the same ideas: writing sums over our subsamples as sums over \(1 \ldots n\) by putting in group indicators \(1_{=x}(X_i)\), conditioning on \(X_1 \ldots X_n\), the indicator trick, thinking about what happens when \(j=i\) and \(j\neq i\) in the double sum we get when we expand the product, etc. In the subsample mean calculation, the \(j=i\) terms gave us some stuff that was nonzero. Why isn’t that happening here?

A Common Mistake

I saw a lot of you guys claim that \(\hat\mu(0)\) and \(\hat\mu(1)\) are independent in your submission. Sometimes that claim wasn’t justified; sometimes it was justified by the claim that the subsamples \(\{ (X_i,Y_i) \ : \ X_i=0\}\) and \(\{ (X_i, Y_i) \ : \ X_i=1 \}\) are independent. That’s not true. One way of seeing that the subsamples aren’t independent is to notice that the numbers \(N_0\) and \(N_1\) of observations in these subsamples sum to \(n\), so if you know one you know the other. The means are do, however, have zero covariance. That’s the point of this exercise, but it does take a little calculation to show it.

🔒

Solution

Locked (Week 5)

Ratios of Means

If \(\hat\mu(1)-\hat\mu(0)\) is a good estimator of \(\mu(1)-\mu(0)\), then shouldn’t \(\hat\mu(1)/\hat\mu(0)\) be a good estimator of \(\mu(1)/\mu(0)\)? Let’s look into it. To do this, we’ll think of the ratio as a function of the two means. \[ \frac{\hat\mu(1)}{\hat\mu(0)} - \frac{\mu(1)}{\mu(0)} = f(\hat\mu(1), \hat\mu(0)) - f(\mu(1), \mu(0)) \qfor f(x,y) = \frac{x}{y}. \]

And we’ll use a linear approximation to this function to think about this difference.

\[ \begin{aligned} f(\hat\mu(1), \hat\mu(0)) \approx f(\mu(1), \mu(0)) &+ \qty[\frac{\partial f}{\partial x}(\mu(1), \mu(0))](\hat\mu(1) - \mu(1)) \\ &+ \qty[\frac{\partial f}{\partial y}(\mu(1), \mu(0))](\hat\mu(0) - \mu(0)) \end{aligned} \]

This approximation should be good if \(\hat\mu(1)\) and \(\hat\mu(0)\) are close to \(\mu(1)\) and \(\mu(0)\).

Exercise 23.3

When our sample includes a reasonably large number of observations in both groups, it’s reasonable to expect that they should be close. With a sentence or two, or a rough sketch if you prefer, explain why.

🔒

Solution

Locked (Week 5)

Now that we’ve justified the approximation, let’s use it to analyze our ratio estimator.

Exercise 23.4

Write out a formula for the linear approximation to the ratio estimator \(\hat\mu(1)/\hat\mu(0)\) in terms of \(\mu(0)\), \(\mu(1)\), \(\hat\mu(1)-\mu(1)\), and \(\hat\mu(0)-\mu(0)\). Then approximate its bias by comparing the expected value of this linear approximation to the estimation target \(\mu(1)/\mu(0)\).

🔒

Solution

Locked (Week 5)

Exercise 23.5

By calculating the variance of the linear approximation to \(\hat\mu(1)/\hat\mu(0)\) you used in Exercise 23.4, find a formula approximating the variance of this ratio estimator. Use it to calculate a 95% confidence interval based on normal approximation for the ratio \(\mu(1)/\mu(0)\) in the National Supported Work Demonstration. Draw it on top of the plot of that estimator’s bootstrap sampling distribution in Figure 2.¹ With all these approximations, are you still getting an interval similar to the one I already drew there, which is calibrated using the bootstrap?

You’ll need some information included in a table in our discussion of this study in lecture.

🔒

Solution

Locked (Week 5)

All of that ignores the error of our linear approximation as a potential problem. We should, if we like, be able to reason about this error using tools from calculus.

Exercise 23.6

Extra Credit. Using some version of Taylor’s Theorem to characterize the error of the linear approximation you’ve been working with, refine your answer to Exercise 23.4: find an upper bound on the absolute value of the estimator’s bias. This should be a formula involving \(\mu(1)\), \(\mu(0)\), and the subsample sizes \(N_1\) and \(N_0\). Compare it to your approximation of the estimator’s standard deviation from Exercise 23.5. Are you worried that confidence intervals like the one you calculated in Exercise 23.5 might have coverage well below the nominal level of 95%?

You can find the figure in the ‘Variance Calculations for Comparisons’ tab at the top of this page. Draw in your interval however you like. You can print this, draw it on paper, and photograph it for submission. You can right click on the plot, save it as an image, and draw on that using your favorite image editor. You can sketch what you see in Figure 2 on paper, add your interval, and photograph that. Maybe the easiest thing to do is use the Tldrawe Chrome Extension to draw right on top of this webpage and take a screenshot. Don’t work too hard. A rough sketch is fine.↩︎