11 Homework: Biased Estimators

Introduction

So far, we’ve exclusively talked about unbiased estimators. That is, estimators with the property that their expected value is equal to the estimation target. \[ \hat\theta \qqtext{ is called unbiased if } \mathop{\mathrm{E}}\qty[\hat{\theta}] = \theta. \] We say an estimator is biased if this isn’t true. In this homework, we’ll work with some biased estimators to get a sense of what bias does to our inference. The punchline: it messes up our interval estimates’ coverage. We’ll see exactly how much in next week’s lecture.

Using Prior Information

To get a sense of what bias means, let’s consider a simple example of a biased estimator. Suppose we’re estimating a population proportion \(\theta\) using a sample \(Y_1 \ldots Y_n\) drawn with replacement from a binary population. Instead of using the sample mean \(\bar{Y}\), we use this estimator. \[ \tilde{Y}_1 = \frac{1}{n+1}\cdot \qty{\frac{1}{2} + \sum_{i=1}^{n}Y_i} \]

What’s going on here? We’re mixing in a “prior observation” of \(1/2\) with our sample. It’s as if we’d seen one observation equal to \(1/2\) before collecting any data, and we’re averaging that with what we actually observe.

Exercise 13.1

Suppose that we’ve drawn a sample \(Y_1 \ldots Y_n\) with replacement from a binary population \(y_1 \ldots y_m\) in which the proportion of ones is \(\theta\). Calculate the expected value of the estimator \(\hat\theta =\tilde{Y}_1\). Then calculate its bias \(\mathop{\mathrm{E}}[\hat\theta] - \theta\) and its standard deviation \(\sqrt{\mathop{\mathrm{\mathop{\mathrm{V}}}}[\hat\theta]}\). Report all three.

🔒

Solution

Locked (Week 2)

This is a specific example of a more general estimator that integrates information from a prior study. Maybe a real study or maybe a study we’re imagining. Suppose we have \(\nprior\) observations \(\yprior_1 \ldots \yprior_{\nprior}\) from this study.¹ \[ \thetaprior = \frac{1}{\nprior}\sum_{i=1}^{\nprior}\yprior_i. \]

Averaging all our observations—from our current study and this prior one—gives us the following estimator. \[ \begin{aligned} \tilde{Y}_{\nprior} &= \frac{1}{\nprior + n} \qty{\sum_{i=1}^{\nprior}\yprior_i + \sum_{i=1}^{n} Y_i} \\ &= \frac{1}{\nprior + n} \qty{ \nprior\thetaprior + n\bar Y } \end{aligned} \]

We treat these prior observations, and therefore their mean \(\thetaprior\), as deterministic. The simple example we started with, \(\tilde{Y}_1\), is a special case where we have a single prior observation \(\yprior_1 = \frac{1}{2}\).

Exercise 13.2

Repeat Exercise 13.1 for \(\tilde{Y}_{\nprior}\).

🔒

Solution

Locked (Week 2)

Visualizing the Impact of Prior Information

Let’s get a sense of how using prior information like this impacts our inference. The plot below shows the sampling distributions of three estimators of a population proportion \(\theta\) at three sample sizes \(n\): 10, 40, and 160. These are the estimators.

The estimator \(\hat\theta_1 = \tilde{Y}_{\nprior}\) for \(\thetaprior=3/4\) and \(\nprior=10\) prior observations.
The estimator \(\hat\theta_2 = \tilde{Y}_{n}\) for \(\thetaprior=3/4\) and \(\nprior=n\) prior observations. The bigger the sample, the more prior observations we use.
The sample mean, \(\hat\theta_3 = \bar Y\).

As usual, the estimation target \(\theta\) is indicated by a green line, the sampling distribution’s mean by a solid blue line, and its mean plus and minus two standard deviations by dotted blue lines.

Figure 14.1: Sampling distributions of three estimators at three sample sizes.

Exercise 14.1

What is the correspondence between \(\hat\theta_1\), \(\hat\theta_2\), and \(\hat\theta_3\) described above and ‘estimator a’, ‘estimator b’, and ‘estimator c’ from the plot?

🔒

Solution

Locked (Week 2)

Sometimes we call these pseudo-observations. This particular interpretation, in which we think of them as coming from a prior study, can be used to derive this estimator from Bayesian principles.↩︎