prior.obs = read.csv("https://qtm285-1.github.io/assets/data/nba_sample_2.csv")
sam = read.csv("https://qtm285-1.github.io/assets/data/nba_sample_1.csv")
pop = read.csv("https://qtm285-1.github.io/assets/data/nba_population.csv")
indicator = function(W,L,...) { W / (W+L) > 1/2 }
library(purrr)
Y.prior = prior.obs |> pmap_vec(indicator)
Y = sam |> pmap_vec(indicator)
y = pop |> pmap_vec(indicator)
n = length(Y)
m = length(y)
theta = mean(y)17 Enrichment: Probability Tools
Things you should know that aren’t on the immediate path
$$
\[ \DeclareMathOperator{\mathop{\mathrm{E}}}{E} \DeclareMathOperator{\mathop{\mathrm{\mathop{\mathrm{V}}}}}{V} \DeclareMathOperator{\P}{P} \]
This is optional enrichment material—probability tools that are good to know but not directly used in the main course content.
The Union Bound
Markov’s Inequality and Consistency
Convergence in mean square implies convergence in probability. Let’s use Markov’s inequality to see why.
Markov’s Inequality
Markov’s inequality says that for a non-negative random variable \(X\) and any \(t > 0\), \[ P(X \ge t) \le \frac{\mathop{\mathrm{E}}[X]}{t}. \]
The usual proof of Markov’s inequality is based on a few simple observations.
- The expectation of the indicator variable \(1_{\ge t}(X)\) is the probability that \(X\) exceeds \(t\), \(P(X \ge t)\).
- If we have some function of \(u\) that’s always larger than \(1_{\ge t}\), i.e., one satisfying \(u_t(x) \ge 1_{\ge t}(x)\) for all \(x\), we know that \(\mathop{\mathrm{E}}[u_t(X)] \ge \mathop{\mathrm{E}}1_{\ge t}(X)\) for any random variable \(X\). If it’s always larger for non-negative \(x\), then \(\mathop{\mathrm{E}}[u_t(X)] \ge \mathop{\mathrm{E}}[1_{\ge t}(X)]\) for any non-negative random variable \(X\).
- The function \(u_t(x)=x/t\) is such a function.1
Locked (Week 0)
Often, instead of using this to bound the random variable we’re interested in directly, e.g. \(X=|\hat\theta - \theta|\), we use it to bound the random variable’s square. \(|X| \ge \epsilon\) if and only if \(X^2 \ge \epsilon^2\), so the probability that \(|X| \ge \epsilon\) is the same as the probability that \(X^2 \ge \epsilon^2\). Applying Markov’s inequality to the random variable \(X^2\) gives us a bound in terms of \(X\)’s mean square which, in the specific case that \(X\) is \(|\hat\theta-\theta|\), is the mean squared error of the estimator \(\hat\theta\), \(\RMSE(\hat\theta)^2=\mathop{\mathrm{E}}[(\hat\theta-\theta)^2]\)
\[ P(X \ge \epsilon) = P(X^2 \ge \epsilon^2) \leq \frac{\mathop{\mathrm{E}}[X^2]}{\epsilon^2} \qqtext{ e.g.} P(|\hat\theta - \theta| \ge \epsilon) = P((\hat\theta - \theta)^2 \ge \epsilon^2) \leq \frac{\mathop{\mathrm{E}}[(\hat\theta - \theta)^2]}{\epsilon^2} \]
This tells us that, if the root-mean-squared error of \(\hat\theta\) goes to zero, then the probability that \(\hat\theta\) is any distance \(\epsilon\) away from \(\theta\) goes to zero, i.e., consistency in mean-square implies consistency in probability.
Markov’s Inequality and Interval Estimation
So far, when we’ve calibrated interval estimates using our estimator’s standard deviation, we’ve relied on normal approximation. In effect, we’ve been using a formula for \(P(\lvert\hat\theta - \theta\rvert \le \epsilon)\) that’s accurate when \(\hat\theta\) has a normal distribution and close enough when its distribution is close enough to normal. In this problem, we’re going to think about doing without this reliance on approximate normality.
Let’s consider \(\hat\theta\), an unbiased estimator of \(\theta\) with standard deviation \(\sigma\), so the normal approximation to the distribution of \(\hat\theta-\theta\) has the density \(f_{0,\sigma}(x)\) below.
\[ P\qty(|\hat\theta - \theta| \le \epsilon) \approx \int_{-\epsilon}^{\epsilon} f_{0,\sigma}(x) dx \qfor f_{0, \sigma}(x) = \frac{1}{\sqrt{2\pi}\sigma} e^{-x^2/2\sigma^2} \]
The reason we’ve been talking about interval estimators of the form \(\hat\theta \pm 1.96 \sigma\) is that, if this approximation were perfect, these interval estimators would have 95% coverage. That is, it’d be true that \(P(|\hat\theta-\theta| \le 1.96 \sigma) = .95\). And if the approximation is pretty good, we should still expect coverage close to that. But suppose we’re not confident that it is. Markov’s inequality allows us to calibrate interval estimates in terms of our estimator’s standard deviation without any caveats about its sampling distribution being approximately normal. Let’s give it a shot.
Locked (Week 0)
Now let’s try this out on the NBA data we’ve been working with recently.
The sample \(Y_1 \ldots Y_{100}\) we’ll use is drawn with replacement from a population \(y_1 \ldots y_{539}\) of indicators. These indicators—one for each of the 539 players who played in the NBA in 2023—are one if the player’s team won more than half the games they played in and zero otherwise.
We’ll consider two point estimators. The first is the sample mean, \(\hat\theta=\bar{Y}\). And the second is the mean-with-prior-observations estimator \(\tilde{Y}_{100}\) we talked about in the last homework, using 100 prior observations from what we called ‘your sample’ in the Week 1 Homework.
Locked (Week 0)