11 Homework: Biased Estimators
$$
\[ \DeclareMathOperator{\mathop{\mathrm{E}}}{E} \DeclareMathOperator{\mathop{\mathrm{\mathop{\mathrm{V}}}}}{V} \DeclareMathOperator{\mathop{\mathrm{sd}}}{sd} \DeclareMathOperator{\mathop{\mathrm{bias}}}{bias} \newcommand{\thetaprior}{\theta^{\text{prior}}} \newcommand{\nprior}{n^{\text{prior}}} \newcommand{\yprior}{y^{\text{prior}}} \]
Introduction
So far, we’ve exclusively talked about unbiased estimators. That is, estimators with the property that their expected value is equal to the estimation target. \[ \hat\theta \qqtext{ is called unbiased if } \mathop{\mathrm{E}}\qty[\hat{\theta}] = \theta. \] We say an estimator is biased if this isn’t true. In this homework, we’ll work with some biased estimators to get a sense of what bias does to our inference. The punchline: it messes up our interval estimates’ coverage. We’ll see exactly how much in next week’s lecture.
Using Prior Information
To get a sense of what bias means, let’s consider a simple example of a biased estimator. Suppose we’re estimating a population proportion \(\theta\) using a sample \(Y_1 \ldots Y_n\) drawn with replacement from a binary population. Instead of using the sample mean \(\bar{Y}\), we use this estimator. \[ \tilde{Y}_1 = \frac{1}{n+1}\cdot \qty{\frac{1}{2} + \sum_{i=1}^{n}Y_i} \]
What’s going on here? We’re mixing in a “prior observation” of \(1/2\) with our sample. It’s as if we’d seen one observation equal to \(1/2\) before collecting any data, and we’re averaging that with what we actually observe.
Locked (Week 2)
This is a specific example of a more general estimator that integrates information from a prior study. Maybe a real study or maybe a study we’re imagining. Suppose we have \(\nprior\) observations \(\yprior_1 \ldots \yprior_{\nprior}\) from this study.1 \[ \thetaprior = \frac{1}{\nprior}\sum_{i=1}^{\nprior}\yprior_i. \]
Averaging all our observations—from our current study and this prior one—gives us the following estimator. \[ \begin{aligned} \tilde{Y}_{\nprior} &= \frac{1}{\nprior + n} \qty{\sum_{i=1}^{\nprior}\yprior_i + \sum_{i=1}^{n} Y_i} \\ &= \frac{1}{\nprior + n} \qty{ \nprior\thetaprior + n\bar Y } \end{aligned} \]
We treat these prior observations, and therefore their mean \(\thetaprior\), as deterministic. The simple example we started with, \(\tilde{Y}_1\), is a special case where we have a single prior observation \(\yprior_1 = \frac{1}{2}\).
Locked (Week 2)
Visualizing the Impact of Prior Information
Let’s get a sense of how using prior information like this impacts our inference. The plot below shows the sampling distributions of three estimators of a population proportion \(\theta\) at three sample sizes \(n\): 10, 40, and 160. These are the estimators.
- The estimator \(\hat\theta_1 = \tilde{Y}_{\nprior}\) for \(\thetaprior=3/4\) and \(\nprior=10\) prior observations.
- The estimator \(\hat\theta_2 = \tilde{Y}_{n}\) for \(\thetaprior=3/4\) and \(\nprior=n\) prior observations. The bigger the sample, the more prior observations we use.
- The sample mean, \(\hat\theta_3 = \bar Y\).
As usual, the estimation target \(\theta\) is indicated by a green line, the sampling distribution’s mean by a solid blue line, and its mean plus and minus two standard deviations by dotted blue lines.
Locked (Week 2)
Sometimes we call these pseudo-observations. This particular interpretation, in which we think of them as coming from a prior study, can be used to derive this estimator from Bayesian principles.↩︎