6 Calibrating Interval Estimates with Binary Observations
Sampling and Calibration Review
A Polling Nightmare
Before the Election
$$
Your Sample
\[ \begin{array}{r|rrrr|r} i & 1 & 2 & \dots & 625 & \bar{Y}_{625} \\ Y_i & 1 & 1 & \dots & 0 & 0.72 \\ \end{array} \]
The Population
\[ \color{lightgray} \begin{array}{r|rrrrrr|r} j & 1 & 2 & 3 & 4 & \dots & 7.23M & \bar{y}_{7.23M} \\ y_{j} & 1 & 1 & 1 & 0 & \dots & 1 & 0.70 \\ \end{array} \]
- You want to estimate the proportion of all registered voters — the population — who will vote.
- To do this, you use the proportion of polled voters — your sample — who said they would.
- You’re probably not going to match the population proportion exactly …
- … so you report an interval estimate—a range of values we claim the population proportion is in.
- You’re know you’re not going to be right 100% of the time you make with claims like this …
- … so you state a nominal coverage probability—how often you’re right about claims like this.1
- Without thinking too hard about it, you …
- … report, as your interval, your sample proportion (72%) ± 1%. Because it sounds good.
- … and say your coverage probability is 95%. Because that’s what everybody else says.
After the Election
Your Sample
\[ \begin{array}{r|rrrr|r} i & 1 & 2 & \dots & 625 & \bar{Y}_{625} \\ Y_i & 1 & 1 & \dots & 0 & 0.72 \\ \end{array} \]
The Population
\[ \begin{array}{r|rrrrrr|r} j & 1 & 2 & 3 & 4 & \dots & 7.23M & \bar{y}_{7.23M} \\ y_{j} & 1 & 1 & 1 & 0 & \dots & 1 & 0.70 \\ \end{array} \]
- When the election occurs, we get to see who turns out to vote.
- 5.05M people, or roughly 70% of registered voters, actually vote.
- You — and future employers — can see how well you did. And how well everybody else did.
- Your interval missed the target. It doesn’t contain the turnout proportion.
- Your point estimate is only off by 2%. But you overstated your precision.
- Now you’re kicking yourself. You’d briefly considered saying ± 3% or ± 4%.
- That would’ve done it. But that didn’t sound as good, so you went for ± 1%.
- You hope you’re not the only one who missed. So you check out the competition.
The Competition
\[ \begin{array}{r|rr|rr|r|rr|r} \text{call} & 1 & & 2 & & \dots & 625 & & \\ \text{poll} & J_1 & Y_1 & J_2 & Y_2 & \dots & J_{625} & Y_{625} & \overline{Y}_{625} \\ \hline \color[RGB]{7,59,76}1 & \color[RGB]{7,59,76}869369 & \color[RGB]{7,59,76}1 & \color[RGB]{7,59,76}4428455 & \color[RGB]{7,59,76}1 & \color[RGB]{7,59,76}\dots & \color[RGB]{7,59,76}1268868 & \color[RGB]{7,59,76}1 & \color[RGB]{7,59,76}0.68 \\ \color[RGB]{239,71,111}2 & \color[RGB]{239,71,111}600481 & \color[RGB]{239,71,111}0 & \color[RGB]{239,71,111}6793745 & \color[RGB]{239,71,111}1 & \color[RGB]{239,71,111}\dots & \color[RGB]{239,71,111}1377933 & \color[RGB]{239,71,111}1 & \color[RGB]{239,71,111}0.71 \\ \color[RGB]{17,138,178}3 & \color[RGB]{17,138,178}3830847 & \color[RGB]{17,138,178}1 & \color[RGB]{17,138,178}5887416 & \color[RGB]{17,138,178}1 & \color[RGB]{17,138,178}\dots & \color[RGB]{17,138,178}4706637 & \color[RGB]{17,138,178}1 & \color[RGB]{17,138,178}0.70 \\ {\vdots} & {\vdots} & {\vdots} & {\vdots} & {\vdots} & {\vdots} & {\vdots} & {\vdots} & {\vdots} \\ \color[RGB]{6,214,160}1M & \color[RGB]{6,214,160}1487507 & \color[RGB]{6,214,160}1 & \color[RGB]{6,214,160}393580 & \color[RGB]{6,214,160}1 & \color[RGB]{6,214,160}\dots & \color[RGB]{6,214,160}1247545 & \color[RGB]{6,214,160}0 & \color[RGB]{6,214,160}0.72 \\ {\vdots} & {\vdots} & {\vdots} & {\vdots} & {\vdots} & {\vdots} & {\vdots} & {\vdots} & {\vdots} \\ \end{array} \]
- It turns out that everyone got their point estimate exactly like you did.
- Each of them rolled their 7.23M-sided die 625 times.
- And after making their 625 calls, they reported their sample proportion.
- You’re not the only one who missed, but you’re part of a pretty small club.
- 5% of the other polls got it wrong.
- But you are the only one who claimed they’d get within 1%.
- Everyone else said claimed they’d get within roughly 4%.
- Most were right. About 95%. And a lot of them had worse point estimates than you.
- What you have is a calibration problem.
Calibration using the Sampling Distribution
- You’d have seen that your interval was miscalibrated if you knew what you do now.
- Your competitors’ point estimates, which they got exactly like you got yours.
- Who actually turned out, so you could simulate even more polls if you wanted to.
- You’d have drawn ± 1% intervals around all these point estimates.2
- And noticed that many of these intervals cover the target. About 43% do.
- Or, if leaving the target out of it, that only 43% of them cover the mean of all these estimates.
- Same thing. The mean and target are right on top of one another. Like this: |.
- And you’d have known how to choose the right interval width. One that makes 95% of these intervals cover. Right?
Quiz
- I’ve drawn the sampling distribution of an estimator, 100 draws from it as ●s, and 3 interval estimates.
- One of these interval estimates is calibrated to have exactly 95% coverage. Which is it?
- A. The one around the ● on top.
- B. The one around the ● in the middle.
- C. The one around the ● on the bottom.
- Submit your answer here.
- Hint. You can reach your twin with your arms if your twin can reach you with theirs.
- Hint. Maybe you look like this ● and your twin like this |.
Calibration in Reality
- But you didn’t know any of this. You just knew your own point estimate.
- Without all that post-election information, you didn’t know how to calibrate an interval estimate.
- But everyone else did. Their widths were almost the same as the width you’d choose now.
- They didn’t know the point estimator’s actual sampling distribution…
- … but they knew enough about it to choose the right interval width.
- That’s what we’re going to look into now. Over the course of this lecture and the next, we’ll work out what the sampling distributon of our estimator can look like and how what it does look like depends on the population.
- And we’ll see how to use that information to calibrate our intervals.
Probability Review
Sampling Distributions of Sums of Binary Random Variables
This’ll be a slog. I’ll explain why we’ve bothered when we get to the end.
Starting Simple
\(j\) | name |
---|---|
\(1\) | Rush |
\(2\) | Mitt |
\(3\) | Al |
- Let’s start with the distribution of \(Y_1\), the response to one call, when we sample with replacement.
- And to keep things concrete, let’s suppose we’re polling a population of size \(m=3\). We’re rolling a 3-sided die.3
- You can see the population’s responses in above in two formats: a table and a plot.
- We have two Nos (0s) and one Yes (1).
- You could say the ‘Yes’ frequency in the population \(\theta_1 = 1/3\).
- We’ll walk through the process of finding the distribution of \(Y_1\).
Finding the Distribution of \(Y_1\) in a Small Population
\(j\) | name | \(y_j\) |
---|---|---|
\(1\) | Rush | \(0\) |
\(2\) | Mitt | \(0\) |
\(3\) | Al | \(1\) |
\(p\) | \(J_1\) | \(Y_1\) |
---|---|---|
\(\frac13\) | \(1\) | \(0\) |
\(\frac13\) | \(2\) | \(0\) |
\(\frac13\) | \(3\) | \(1\) |
\(p\) | \(J_1\) | \(Y_1\) |
---|---|---|
\(\frac13\) | \(\color[RGB]{64,64,64}1\) | \(0\) |
\(\frac13\) | \(\color[RGB]{64,64,64}2\) | \(0\) |
\(\frac13\) | \(\color[RGB]{64,64,64}3\) | \(1\) |
\(p\) | \(\color[RGB]{64,64,64}J_1\) | \(Y_1\) |
---|---|---|
\(\color[RGB]{239,71,111}\frac13\) | \(\color[RGB]{64,64,64}1\) | \(\color[RGB]{239,71,111}0\) |
\(\color[RGB]{239,71,111}\frac13\) | \(\color[RGB]{64,64,64}2\) | \(\color[RGB]{239,71,111}0\) |
\(\color[RGB]{17,138,178}\frac13\) | \(\color[RGB]{64,64,64}3\) | \(\color[RGB]{17,138,178}1\) |
\(p\) | \(Y_1\) |
---|---|
\(\frac23\) | \(\color[RGB]{239,71,111}0\) |
\(\frac13\) | \(\color[RGB]{17,138,178}1\) |
\(p\) | \(Y_1\) |
---|---|
\(\color[RGB]{239,71,111}\frac23\) | \(\color[RGB]{239,71,111}0\) |
\(\color[RGB]{17,138,178}\frac13\) | \(\color[RGB]{17,138,178}1\) |
- We write the probability distribution of our die roll in a table.
- It has two columns: one for the roll \(J_1\) and another for its probability \(p\)
- It has \(m=3\) rows, one for each possible outcome of the roll.
- Each roll is equally likely, so the probability in each from is \(1/3\)
- We add a column for the response to our call, \(Y_1\).
- Adding this row is ‘free’—it doesn’t change the row’s propability—because what we hear is determined by the roll.
- E.g. if we roll a 1, we’re going to hear Mitt say ‘Yes’.
- So the probability we roll a 1 and hear a ‘Yes’ is the same as the probability we roll a 1.
- E.g. if we roll a 1, we’re going to hear Mitt say ‘Yes’.
- We marginalize over rolls to find the distribution of our call’s outcome.
- We can color each row according to the call’s outcome \(Y_1\), then sum the probabilities in each color.
- The probability of hearing ‘Yes’ is the probability that we roll a 1 or 2: \(1/3+1/3 = 2/3\).
- The probability of hearing ‘No’ is the probability that we roll a 3: \(1/3\).
- These probabilities match the frequencies of ‘Yes’ and ‘No’ in our population.
Generalizing to Larger Populations
- Is this probabilities-match-frequencies phenomenon a fluke? Or is it a general rule?
- To find out, we’ll work through the same steps in abstract, i.e. for a binary population \(y_1 \ldots y_m\) of arbitrary size \(m\).
- To ground us while we do it, we’ll think about our population of \(m=7.23M\) registered voters in Georgia.
- But we’ll handle the general case. Plugging in 7.23M ones and zeros wouldn’t exactly make things easier.
Finding the Distribution of \(Y_1\) in a Large Population
The Population
\(j\) | \(y_j\) |
---|---|
\(1\) | 1 |
\(2\) | 1 |
\(3\) | 1 |
\(4\) | 0 |
⋮ | ⋮ |
\(m\) | 1 |
The Joint
\(p\) | \(J_1\) | \(Y_1\) |
---|---|---|
\(1/m\) | \(1\) | \(1\) |
\(1/m\) | \(2\) | \(1\) |
\(1/m\) | \(3\) | \(1\) |
\(1/m\) | \(4\) | \(0\) |
⋮ | ⋮ | ⋮ |
\(1/m\) | \(m\) | \(1\) |
The Marginal
\(p\) | \(Y_1\) |
---|---|
\(\underset{\color{gray}\approx 0.30}{\sum\limits_{j:y_j=0} \frac{1}{m}}\) | \(0\) |
\(\underset{\color{gray}\approx 0.70}{\sum\limits_{j:y_j=1} \frac{1}{m}}\) | \(1\) |
Generalizing to Larger Populations
- We start by writing the probability distribution of our dice rolls.
- The person we call, \(J_1\), is equally likely to be anyone in the population of \(m \approx 7.23M\) people.
- It takes on each value \(1 \ldots m\) with probability \(1/m\).
- We add a column for the response to our call, \(Y_1\).
- This is ‘for free’. The roll determines what we hear, so the probabilities don’t change. Still \(1/m\).
- And we marginalize to find the distribution of \(Y_1\).
- We sum the probabilities in rows where \(Y_1=0\) and \(Y_1=1\).
- What we saw in our small population does generalize.
- The probability of hearing a response \(y \in \{0,1\}\) is the frequency of that response in the population.
- We’ll call these frequencies \(\theta_0\) and \(\theta_1\). This is a little redundant because \(\theta_0 = 1 - \theta_1\).
- Our estimation target is just the ‘Yes’ frequency \(\theta_1\). Some people call it the success rate.
- To find the distribtion of a sum of \(Y_1+Y_2\), we can follow the same steps.
- There’s a bit more too it, so we’ll break down the marginalization step to make it a little more tractable.
Exercise: Two Calls to A Small Population
\(j\) | name | \(y_j\) |
---|---|---|
\(1\) | Rush | \(0\) |
\(2\) | Mitt | \(0\) |
\(3\) | Al | \(1\) |
- Make a table for the joint distribution of three rolls of a 3-sided die. Add columns for the responses.
- Partially marginalize to get the joint distribution of the responses. Add a column for their sum.
- Marginalize to get the distribution of the sum.
Aren’t you happy that, even though you can’t make a 3-sided die, I didn’t ask you to use 4-sided one?
\(p\) | \(J_1\) | \(J_2\) | \(Y_1\) | \(Y_2\) |
---|---|---|---|---|
\(\color[RGB]{239,71,111}1/9\) | \(1\) | \(1\) | \(0\) | \(0\) |
\(\color[RGB]{239,71,111}1/9\) | \(1\) | \(2\) | \(0\) | \(0\) |
\(\color[RGB]{17,138,178}1/9\) | \(1\) | \(3\) | \(0\) | \(1\) |
\(\color[RGB]{239,71,111}1/9\) | \(2\) | \(1\) | \(0\) | \(0\) |
\(\color[RGB]{239,71,111}1/9\) | \(2\) | \(2\) | \(0\) | \(0\) |
\(\color[RGB]{17,138,178}1/9\) | \(2\) | \(3\) | \(0\) | \(1\) |
\(\color[RGB]{17,138,178}1/9\) | \(3\) | \(1\) | \(1\) | \(0\) |
\(\color[RGB]{17,138,178}1/9\) | \(3\) | \(2\) | \(1\) | \(0\) |
\(\color[RGB]{6,214,160}1/9\) | \(3\) | \(3\) | \(1\) | \(1\) |
\(p\) | \(Y_1\) | \(Y_2\) | \(Y_1 + Y_2\) |
---|---|---|---|
\(\color[RGB]{239,71,111}\frac{4}{9}\) | \(0\) | \(0\) | \(0\) |
\(\color[RGB]{17,138,178}\frac{2}{9}\) | \(0\) | \(1\) | \(1\) |
\(\color[RGB]{17,138,178}\frac{2}{9}\) | \(1\) | \(0\) | \(1\) |
\(\color[RGB]{6,214,160}\frac{1}{9}\) | \(1\) | \(1\) | \(2\) |
\(p\) | \(Y_1 + Y_2\) |
---|---|
\(\color[RGB]{239,71,111}\frac{4}{9}\) | \(0\) |
\(\color[RGB]{17,138,178}\frac{4}{9}\) | \(1\) |
\(\color[RGB]{6,214,160}\frac{1}{9}\) | \(2\) |
From Sampling to Coin Flip Sums: Two Responses
The Joint
\(p\) | \(J_1\) | \(J_2\) | \(Y_1\) | \(Y_2\) | \(Y_1 + Y_2\) |
---|---|---|---|---|---|
\(\frac{1}{m^2}\) | \(1\) | \(1\) | \(y_1\) | \(y_1\) | \(y_1+y_1\) |
\(\frac{1}{m^2}\) | \(1\) | \(2\) | \(y_1\) | \(y_2\) | \(y_1+y_2\) |
⋮ | ⋮ | ⋮ | ⋮ | ⋮ | |
\(\frac{1}{m^2}\) | \(1\) | \(m\) | \(y_1\) | \(y_m\) | \(y_1+y_m\) |
\(\frac{1}{m^2}\) | \(2\) | \(1\) | \(y_2\) | \(y_1\) | \(y_2+y_1\) |
\(\frac{1}{m^2}\) | \(2\) | \(2\) | \(y_2\) | \(y_2\) | \(y_2+y_2\) |
⋮ | ⋮ | ⋮ | ⋮ | ⋮ | |
\(\frac{1}{m^2}\) | \(m\) | \(m\) | \(y_m\) | \(y_m\) | \(y_m+y_m\) |
Partially Marginalized
\(p\) | \(Y_1\) | \(Y_2\) | \(Y_1 + Y_2\) |
---|---|---|---|
\(\textcolor[RGB]{239,71,111}{\sum\limits_{\substack{j_1,j_2 \\ y_{j_1},y_{j_2} = 0,0}} \frac{1}{m^2}}\) | \(0\) | \(0\) | \(\textcolor[RGB]{239,71,111}{0}\) |
\(\textcolor[RGB]{17,138,178}{\sum\limits_{\substack{j_1,j_2 \\ y_{j_1},y_{j_2} = 0,1}} \frac{1}{m^2}}\) | \(0\) | \(1\) | \(\textcolor[RGB]{17,138,178}{1}\) |
\(\textcolor[RGB]{17,138,178}{\sum\limits_{\substack{j_1,j_2 \\ y_{j_1},y_{j_2} = 1,0}} \frac{1}{m^2}}\) | \(1\) | \(0\) | \(\textcolor[RGB]{17,138,178}{1}\) |
\(\textcolor[RGB]{6,214,160}{\sum\limits_{\substack{j_1,j_2 \\ y_{j_1},y_{j_2} = 1,1}} \frac{1}{m^2}}\) | \(1\) | \(1\) | \(\textcolor[RGB]{6,214,160}{2}\) |
Fully Marginalized
\(p\) | \(Y_1 + Y_2\) |
---|---|
\(\textcolor[RGB]{239,71,111}{\sum\limits_{\substack{j_1,j_2 \\ y_{j_1}+y_{j_2} = 0}} \frac{1}{m^2}}\) | \(\textcolor[RGB]{239,71,111}{0}\) |
\(\textcolor[RGB]{17,138,178}{\sum\limits_{\substack{j_1,j_2 \\ y_{j_1}+y_{j_2} = 1}} \frac{1}{m^2}}\) | \(\textcolor[RGB]{17,138,178}{1}\) |
\(\textcolor[RGB]{6,214,160}{\sum\limits_{\substack{j_1,j_2 \\ y_{j_1}+y_{j_2} = 2}} \frac{1}{m^2}}\) | \(\textcolor[RGB]{6,214,160}{2}\) |
To find the distribution of a sum of two responses, \(Y_1+Y_2\), we do the same thing.
We start with the joint distribution of two dice rolls.
- Our rolls are equally likely to be any pair of numbers in \(1\ldots m\).
- And there are \(m^2\) pairs, so the probability is \(1/m^2\) for each pair.
Then we add columns that are functions of the rolls: \(Y_1\), \(Y_2\), and \(Y_1+Y_2\).
- This is ‘for free’ as before.
- If we know who we’re calling, we know the responses we’ll hear and their sum.
Then we marginalize to find the distribution of the sum. We’ll do this in two steps.
- We partially marginalize over \(J_1,J_2\), to find the joint distribution of the response sequence \(Y_1, Y_2\).
- We fully marginalize the result to find the distribution of \(Y_1+Y_2\).
- To sum over pairs of responses, we sum over one response then the other.
- This exactly like what we did for one response. Except twice.
- And what we get is pretty similar too.
- The probability of observing the pair of responses is determined by their frequencies in the population.
- It’s the product of the frequences of those responses.
- For ‘Yes,Yes’ it’s \(\theta_1^2\). For ‘No,No’ it’s \(\theta_0^2\).
- And for ‘Yes,No’ and ‘No,Yes’ it’s \(\theta_1\theta_0\).
- This is, for what it’s worth, the product of the marginal probabilities of those responses.
- When this happens for any pair of random variables, we say they are independent. More on that soon.
- And the probability of any response pair \(a,b\) depends only on its sum \(a+b\).
\[ \begin{aligned} \sum\limits_{\substack{j_1,j_2\\ y_{j_1},y_{j_2} = a,b}} \frac{1}{m^2} &= \sum\limits_{\substack{j_1 \\ y_{j_1}=a}} \qty{ \sum\limits_{\substack{j_2 \\ y_{j_2}=b}} \frac{1}{m^2} } \\ &= \sum\limits_{\substack{j_1 \\ y_{j_1}=a}} \qty{ m_b \times \frac{1}{m^2} } \qfor m_y = \sum\limits_{j:y_j=y} 1 \\ &= m_a \times m_b \times \frac{1}{m^2} = \theta_a \times \theta_b \qfor \theta_y = \frac{m_y}{m} \\ &= \theta_1^{s} \theta_0^{1-s} \qfor s = a+b \end{aligned} \]
- To calculate the marginal probability that \(Y_1+Y_2=s\), we sum the probabilities of all pairs with that sum.
- This is pretty easy because there aren’t too many pairs.
- And I’ve color coded the pairs to make it even easier.
From Sampling to Coin Flip Sums: \(n\) Responses.
\(p\) | \(J_1\) | … | \(J_n\) | \(Y_1\) | … | \(Y_n\) |
---|---|---|---|---|---|---|
\(\frac{1}{m^n}\) | \(1\) | … | \(1\) | \(y_1\) | … | \(y_1\) |
\(\frac{1}{m^n}\) | \(1\) | … | \(2\) | \(y_1\) | … | \(y_2\) |
⋮ | ⋮ | ⋱ | ⋮ | ⋮ | ⋱ | ⋮ |
\(\frac{1}{m^n}\) | \(1\) | … | \(m\) | \(y_1\) | … | \(y_m\) |
\(\frac{1}{m^n}\) | \(2\) | … | \(1\) | \(y_2\) | … | \(y_1\) |
\(\frac{1}{m^n}\) | \(2\) | … | \(2\) | \(y_2\) | … | \(y_2\) |
⋮ | ⋮ | ⋱ | ⋮ | ⋮ | ⋱ | ⋮ |
\(\frac{1}{m^n}\) | \(m\) | … | \(m\) | \(y_m\) | … | \(y_m\) |
\(p\) | \(Y_1 + \ldots + Y_n\) |
---|---|
\(\color[RGB]{239,71,111}\sum\limits_{\substack{a_1 \ldots a_n \\ a_1 + \ldots + a_n = 0}} \sum\limits_{\substack{j_1 \ldots j_n \\ y_{j_1} \ldots y_{j_n} = a_1 \ldots a_n}} \frac{1}{m^n}\) | \(\color[RGB]{239,71,111}0\) |
\(\color[RGB]{17,138,178}\sum\limits_{\substack{a_1 \ldots a_n \\ a_1 + \ldots + a_n = 0}} \sum\limits_{\substack{j_1 \ldots j_n \\ y_{j_1} \ldots y_{j_n} = a_1 \ldots a_n}} \frac{1}{m^n}\) | \(\color[RGB]{17,138,178}1\) |
⋮ | ⋮ |
\(\color[RGB]{6,214,160}\sum\limits_{\substack{a_1 \ldots a_n \\ a_1 + \ldots + a_n = n-1}} \sum\limits_{\substack{j_1 \ldots j_n \\ y_{j_1} \ldots y_{j_n} = a_1 \ldots a_n}} \frac{1}{m^n}\) | \(\color[RGB]{6,214,160}n-1\) |
\(\color[RGB]{255,209,102}\sum\limits_{\substack{a_1 \ldots a_n \\ a_1 + \ldots + a_n = n}} \sum\limits_{\substack{j_1 \ldots j_n \\ y_{j_1} \ldots y_{j_n} = a_1 \ldots a_n}} \frac{1}{m^n}\) | \(\color[RGB]{255,209,102}n\) |
Partially Marginalized
\(p\) | \(Y_1\) | \(Y_2\) | … | \(Y_n\) | \(Y_1 + \ldots + Y_n\) |
---|---|---|---|---|---|
\(\color[RGB]{239,71,111}\sum\limits_{\substack{j_1 \ldots j_n \\ y_{j_1}, y_{j_2} \ldots y_{j_n} = 0, 0 \ldots 0}} \frac{1}{m^n}\) | 0 | 0 | … | \(0\) | \(\color[RGB]{239,71,111}0\) |
\(\color[RGB]{17,138,178}\sum\limits_{\substack{j_1 \ldots j_n \\ y_{j_1}, y_{j_2} \ldots y_{j_n} = 0, 0 \ldots 1}} \frac{1}{m^n}\) | 0 | 0 | … | \(1\) | \(\color[RGB]{17,138,178}1\) |
⋮ | ⋮ | ⋱ | ⋮ | ⋮ | ⋱ |
\(\color[RGB]{17,138,178}\sum\limits_{\substack{j_1 \ldots j_n \\ y_{j_1}, y_{j_2} \ldots y_{j_n} = 0, 1 \ldots 0}} \frac{1}{m^n}\) | 0 | 1 | … | \(0\) | \(\color[RGB]{17,138,178}1\) |
⋮ | ⋮ | ⋱ | ⋮ | ⋮ | ⋱ |
\(\color[RGB]{6,214,160}\sum\limits_{\substack{j_1 \ldots j_n \\ y_{j_1}, y_{j_2} \ldots y_{j_n} = 0, 1 \ldots 1}} \frac{1}{m^n}\) | 0 | 1 | … | \(1\) | \(\color[RGB]{6,214,160}n-1\) |
\(\color[RGB]{255,209,102}\sum\limits_{\substack{j_1 \ldots j_n \\ y_{j_1}, y_{j_2} \ldots y_{j_n} = 1, 1 \ldots 1}} \frac{1}{m^n}\) | 1 | 1 | … | \(1\) | \(\color[RGB]{255,209,102}n\) |
To find the distribution of a sum \(Y_1 + \ldots + Y_n\), we do the same thing.
We start by writing out the joint distribution of \(n\) dice rolls.
- We’re equally likely to roll any sequence of \(n\) numbers in \(1\ldots m\).
- And there are \(m^n\) sequences, so the probability is \(1/m^n\) for each sequence.
- Then we add columns for the corresponding responses ‘for free’. And the sums.
Then we marginalize in two steps.
- Sum over roll sequences leading to the same response sequence \(a_1 \ldots a_n\).
- Sum over response sequences leading to the same sum \(s=a_1+\ldots+a_n\).
This isn’t a class about counting, so this stuff won’t be on the exam.
- To sum over roll sequences, we sum over one roll after another.
- This exactly like what we did for one response. Except over and over.
- You can think of this sum ’inside out.
- Suppose we’re calculating the probability of the response sequence \(a_1 \ldots a_n = 0,0,...,0,0\).
- We consider any ‘head’ \(j_1\ldots j_{n-1}\) where \(y_{j_1} \ldots y_{j_{n-1}}=0,0,...0\) and think about choosing the last element. No matter what the head is, there are \(m_0\) ways to complete with by choosing \(j_{n}\) with \(y_{j_n}=0\).
- We repeat with shortened ‘head’ \(j_1 \ldots j_{n-2}\) to consider the choice of \(j_{n-1}\). We have \(m_0\) choices and our last step tells us that each choice results in \(m_0\) call sequences, so there are \(m_0^2\) sequences with head \(j_1 \ldots j_{n-2}\) ending in \(0,0\).
- Repeat \(n-2\) more times.
- What we get is pretty similar to the two-response case.
- The probability of observing the sequence is determined by the frequency of Yeses and Nos in the population.
- It’s the product of the frequences of those responses.
- For ‘Yes,Yes,…,Yes’ it’s \(\theta_1^n\). For ‘No,No,…,No’ it’s \(\theta_0^n\).
- And if we have \(s\) Yeses and therefore \(n-s\) Nos, it’s \(\theta_1^s\theta_0^{n-s}\).
- This is, again, the product of the marginal probabilities of those \(n\) responses.
\[ \begin{aligned} \sum_{\substack{j_1 \ldots j_n \\ y_{j_1} \ldots y_{j_n} = a_1 \ldots a_n}} \frac{1}{m^n} &= \sum_{\substack{j_1 \ldots j_{n-1} \\ y_{j_1} \ldots y_{j_{n-1}} = a_1 \ldots a_{n-1}}} \sum_{\substack{j_n \\ y_n=a_n}} \frac{1}{m^n} \\ &= \sum_{\substack{j_1 \ldots j_{n-1} \\ y_{j_1} \ldots y_{j_{n-1}} = a_1 \ldots a_{n-1}}} \sum_{\substack{j_n \\ y_n=a_n}} m_{a_n} \times \frac{1}{m^n} \\ &= \sum_{\substack{j_1 \ldots j_{n-2} \\ y_{j_1} \ldots y_{j_{n-2}} = a_1 \ldots a_{n-2}}} \sum_{\substack{j_{n-1} \\ y_n=a_{n-1}}} m_{a_n} \times \frac{1}{m^n} \\ &= \sum_{\substack{j_1 \ldots j_{n-2} \\ y_{j_1} \ldots y_{j_{n-2}} = a_1 \ldots a_{n-2}}} \sum_{\substack{j_{n-1} \\ y_n=a_{n-1}}} m_{a_{n-1}} m_{a_n} \times \frac{1}{m^n} \\ &= m_{a_1} \ldots m_{a_{n}} \times \frac{1}{m^n} \qqtext{after repeating $n-2$ more times} \\ &= \theta_{a_i} \ldots \theta_{a_n} \\ &= \prod_{i:a_i=1} \theta_1 \prod_{i:a_i=0} \theta_0 = \theta_1^{s} \theta_0^{n-s} \qfor s = a_1 + \ldots + a_n \end{aligned} \]
- The full marginalization step is a bit trickier. But we can take advantage of our partial marginalization.
- To calculate the probability that the sum takes on the value \(s\), we order our sum of probabilities thoughtfully.
- We sum the over response sequences \(a_1 \ldots a_n\) with \(s\) Yeses and \(n-s\) Nos.
- And within that, sum over calls where we’d hear those responses.
- The inner sum we’ve done. That was our partial marginalization.
- And the outer sum boils down to counting the number of sequences with \(s\) Yeses.
- The probability of all these sequences is the same: \(\theta_1^s\theta_0^{n-s}\). So summing is counting and multiplying.
- You may have done the counting part in high school in a unit on ‘permutations and combinations’.
- In any case, that count has a name. It’s spoken ‘\(n\) choose \(s\)’ and written \(\binom{n}{s}\).
\[ \begin{aligned} \sum\limits_{\substack{j_1 \ldots j_n \\ y_{j_1} + \ldots + y_{j_n} = s}} \frac{1}{m^n} &= \sum\limits_{\substack{a_1 \dots a_n \\ a_1 + \ldots + a_n = s}} \ \ \sum_{\substack{j_1 \ldots j_n \\ y_{j_1} \ldots y_{j_n} = a_1 \ldots a_n}} \frac{1}{m^n} \\ & =\sum\limits_{\substack{a_1 \dots a_n \\ a_1 + \ldots + a_n = s}} \theta_1^{s}\theta_0^{n-s} \\ &= \binom{s}{n} \theta_1^{s} \theta_0^{n-s} \qqtext{ where } \binom{s}{n} = \sum\limits_{\substack{a_1 \dots a_n \\ a_1 + \ldots + a_n = s}} 1 \end{aligned} \]
\[ P\qty(\sum_{i=1}^n Y_i = s) = \binom{s}{n} \theta_1^{s}\theta_0^{n-s} \ \ \text{ where } \ \ \binom{s}{n} \text{ is the number of binary sequences $a_1 \ldots a_n$ summing to $s$.} \]
- We don’t actually have to do all this sequence-summing-to-\(s\) counting ourselves.
- The
choose
function in R will do it for us: \(\binom{s}{n}\) ischoose(n,s)
. - The
dbinom
function will give us the whole probability: \(\binom{s}{n}\theta_1^s\theta_0^{n-s}\) isdbinom(s, n, theta_1)
. - The
rbinom
function will draw samples from this distribution:rbinom(10000, n, theta_1)
gives us 10,000.
= .7
theta_1 = 625
n = dbinom(0:n, n, theta_1)
p = rbinom(10000, n, theta_1)
S
ggplot() + geom_area(aes(x=0:n, y=p), color=pollc, alpha=.2) +
geom_bar(aes(x=S, y=after_stat(prop)), alpha=.3) +
geom_point(aes(x=S[1:100], y=max(p)*seq(0,1,length.out=100)),
color='purple', alpha=.2)
The Binomial Distribution
\[ \begin{aligned} &\overset{\color{gray}=P\qty(\frac{1}{n}\sum_{i=1}^n Y_i = \frac{s}{n})}{P\qty(\sum_{i=1}^n Y_i = s)} = \binom{s}{n} \theta_1^{s}\theta_0^{n-s} \\ &\qfor n =625 \\ &\qand \theta_1 \in \{\textcolor[RGB]{239,71,111}{0.68}, \textcolor[RGB]{17,138,178}{0.7}, \textcolor[RGB]{6,214,160}{0.72} \} \end{aligned} \]
- These functions have ‘binom’ in their name because we call this distribution the Binomial distribution.
- The Binomial distribution on \(n\) trials with success probability \(\theta\) is our name for …
- the distribution of the sum of \(n\) independent binary random variables with probability \(\theta\) of being \(1\).
- E.g. the number of heads in \(n\) coin flips is Binomial with \(n\) trials and success probability \(1/2\).
- We’ve shown that’s the sampling distribution of the sum of responses, \(Y_1 + \ldots + Y_n\), when we sample …
- … with replacement from a population of binary responses \(y_1 \ldots y_m\) in which \(\theta\) is the frequency of ones.
- We’re interested in the mean of responses, so we divide by \(n\): \(\color{gray} \sum_{i=1}^n Y_i = s \ \text{ when } \ \frac{1}{n}\sum_{i=1}^n Y_i = s/n\).
- It’s easy to estimate — and talk about estimating — this sampling distribution.
- It depends only on one thing we don’t know — the population frequency \(\theta\).
- And that’s exactly the thing we’re trying to estimate anyway.
- That’s why we’ve started here—the case of sampling with replacement from a population of binary responses.
Payoff
To estimate this sampling distribution, you plug your point estimate \(\hat\theta\) into the Binomial formula. \[ \hat P\qty(\sum_{i=1}^n Y_i = s) = \binom{s}{n} \hat\theta^{s} (1-\hat\theta)^{n-s} \qqtext{ estimates } P\qty(\sum_{i=1}^n Y_i = s) = \binom{s}{n} \theta^{s} (1-\theta)^{n-s} \]
To calibrate your interval estimate, you …
- use
rbinom
to draw 10,000 samples from this estimate of the sampling distribution. - Even remembering to divide by \(n\).
- And use the function
width
from the Week 1 Homework to find an interval that covers 95% of them.
- use
= mean(Y)
theta.hat = rbinom(10000, n, theta.hat) / n
samples = width(samples, alpha=.05) interval.width
- You nail it. Your interval covers the estimation target \(\theta\) just like 95% of your competitors’ do.
- It’s not just that you’ve widened your interval enough.
- You’ve widened it almost exactly the right amount.
- Just like your competitors. Almost as if you all knew …
- how to estimate your estimator’s sampling distribution.
Sampling without Replacement
Another Example. This won’t be on the Exam.
Marginalization
\(p\) | \(J_1\) | … | \(J_n\) | \(Y_1\) | … | \(Y_n\) |
---|---|---|---|---|---|---|
\(\frac{m-n!}{m!}\) | \(1\) | … | \(1\) | \(y_1\) | … | \(y_1\) |
\(\frac{m-n!}{m!}\) | \(1\) | … | \(2\) | \(y_1\) | … | \(y_2\) |
⋮ | ⋮ | ⋱ | ⋮ | ⋮ | ⋱ | ⋮ |
\(\frac{m-n!}{m!}\) | \(1\) | … | \(m\) | \(y_1\) | … | \(y_m\) |
\(\frac{m-n!}{m!}\) | \(2\) | … | \(1\) | \(y_2\) | … | \(y_1\) |
\(\frac{m-n!}{m!}\) | \(2\) | … | \(2\) | \(y_2\) | … | \(y_2\) |
⋮ | ⋮ | ⋱ | ⋮ | ⋮ | ⋱ | ⋮ |
\(\frac{m-n!}{m!}\) | \(m\) | … | \(m\) | \(y_m\) | … | \(y_m\) |
\(p\) | \(Y_1 + \ldots + Y_n\) |
---|---|
\(\color[RGB]{239,71,111}\sum\limits_{\substack{a_1 \ldots a_n \\ a_1 + \ldots + a_n = 0}} \sum\limits_{\substack{j_1 \neq \ldots \neq j_n \\ y_{j_1} \ldots y_{j_n} = a_1 \ldots a_n}} \frac{m-n!}{m!}\) | \(\color[RGB]{239,71,111}0\) |
\(\color[RGB]{17,138,178}\sum\limits_{\substack{a_1 \ldots a_n \\ a_1 + \ldots + a_n = 0}} \sum\limits_{\substack{j_1 \neq \ldots \neq j_n \\ y_{j_1} \ldots y_{j_n} = a_1 \ldots a_n}} \frac{m-n!}{m!}\) | \(\color[RGB]{17,138,178}1\) |
⋮ | ⋮ |
\(\color[RGB]{6,214,160}\sum\limits_{\substack{a_1 \ldots a_n \\ a_1 + \ldots + a_n = n-1}} \sum\limits_{\substack{j_1 \neq \ldots \neq j_n \\ y_{j_1} \ldots y_{j_n} = a_1 \ldots a_n}} \frac{m-n!}{m!}\) | \(\color[RGB]{6,214,160}n-1\) |
\(\color[RGB]{255,209,102}\sum\limits_{\substack{a_1 \ldots a_n \\ a_1 + \ldots + a_n = n}} \sum\limits_{\substack{j_1 \neq \ldots \neq j_n \\ y_{j_1} \ldots y_{j_n} = a_1 \ldots a_n}} \frac{m-n!}{m!}\) | \(\color[RGB]{255,209,102}n\) |
Partially Marginalized
\(p\) | \(Y_1\) | \(Y_2\) | … | \(Y_n\) | \(Y_1 + \ldots + Y_n\) |
---|---|---|---|---|---|
\(\color[RGB]{239,71,111}\sum\limits_{\substack{j_1 \neq \ldots \neq j_n \\ y_{j_1}, y_{j_2} \ldots y_{j_n} = 0, 0 \ldots 0}} \frac{m-n!}{m!}\) | 0 | 0 | … | \(0\) | \(\color[RGB]{239,71,111}0\) |
\(\color[RGB]{17,138,178}\sum\limits_{\substack{j_1 \neq \ldots \neq j_n \\ y_{j_1}, y_{j_2} \ldots y_{j_n} = 0, 0 \ldots 1}} \frac{m-n!}{m!}\) | 0 | 0 | … | \(1\) | \(\color[RGB]{17,138,178}1\) |
⋮ | ⋮ | ⋱ | ⋮ | ⋮ | ⋱ |
\(\color[RGB]{17,138,178}\sum\limits_{\substack{j_1 \neq \ldots \neq j_n \\ y_{j_1}, y_{j_2} \ldots y_{j_n} = 0, 1 \ldots 0}} \frac{m-n!}{m!}\) | 0 | 1 | … | \(0\) | \(\color[RGB]{17,138,178}1\) |
⋮ | ⋮ | ⋱ | ⋮ | ⋮ | ⋱ |
\(\color[RGB]{6,214,160}\sum\limits_{\substack{j_1 \neq \ldots \neq j_n \\ y_{j_1}, y_{j_2} \ldots y_{j_n} = 0, 1 \ldots 1}} \frac{m-n!}{m!}\) | 0 | 1 | … | \(1\) | \(\color[RGB]{6,214,160}n-1\) |
\(\color[RGB]{255,209,102}\sum\limits_{\substack{j_1 \neq \ldots \neq j_n \\ y_{j_1}, y_{j_2} \ldots y_{j_n} = 1, 1 \ldots 1}} \frac{m-n!}{m!}\) | 1 | 1 | … | \(1\) | \(\color[RGB]{255,209,102}n\) |
To find the distribution of a sum \(Y_1 + \ldots + Y_n\) when we sample without replacement we do the same thing.
We start by writing out the joint distribution of \(n\) dice rolls.
- We’re equally likely to pull any sequence of \(n\) distinct numbers in \(1\ldots m\).
- There are \(m \times (m-1) \times \ldots \times (m-n+1)=m!/(m-n)!\) sequences…
- … so the probability is \((m-n)!/m!\) for each sequence.
- Then we add the responses and sums as columns. That doesn’t change.
Then we marginalize in two steps.
- Sum over roll sequences leading to the same response sequence \(a_1 \ldots a_n\).
- Sum over response sequences leading to the same sum \(s=a_1+\ldots+a_n\).
- To sum over roll sequences, we sum over one roll after another.
- Suppose we’re calculating the probability of the response sequence \(a_1 \ldots a_n = 0,0,...,0,0\).
- We consider any ‘head’ \(j_1\ldots j_{n-1}\) where \(y_{j_1} \ldots y_{j_{n-1}}=0,0,...0\) and think about choosing the last element. No matter what the head is, we’ve ‘used up’ \(n-1\) people with response \(y_j=0\). So there are \(m_0-(n-1)=m_0-n+1\) ways to complete the sequence by choosing an as-yet unused \(j_{n}\) with \(y_{j_n}=0\).
- We repeat with shortened ‘head’ \(j_1 \ldots j_{n-2}\) to consider the choice of \(j_{n-1}\). We have \(m_0-n+2\) as-yet unused choices and our last step tells us that each choice results in \(m_0-(n-1)\) call sequences, so there are \((m_0 - n + 1)(m_0 - n + 2)\) sequences with head \(j_1 \ldots j_{n-2}\) ending in \(0,0\).
- Repeat \(n-2\) more times. We get \((m_0 - n + 1) \times \ldots \times m_0 = m_0!/(m_0-n)!\) rows.
- When we work through the same argument for sequences with a mix of 0s and 1s, we see that a call in the head forecloses possibilities for later calls with the same response only. Thus, when we have \(s_1\) ones and \(s_0\) zeros, we can think of ourselves as choosing calls resulting with response zero and response one separately. We have \(m_0!/(m_0-s_0)! \times m_1! / (m_1-s_1)!\) rows.