23  Why Do Models Work?

Enrichment Content

This is a pre-exam-2 “breather” lecture. It steps back from mechanics to ask: why do the statistical models we write down work when they’re so obviously wrong?

Overview

Throughout this course, we’ve written things like: \[ Y_i = \mu + \epsilon_i \qqtext{or} Y_i = \mu(W_i) + \epsilon_i \] where \(\epsilon_i\) is “noise” with mean zero. Sometimes we assume \(\epsilon_i \sim N(0, \sigma^2)\). Sometimes we let \(\sigma\) depend on \(W\): \(\epsilon_i \sim N(0, \sigma^2(W_i))\).

But real data doesn’t come from normal distributions. The “errors” aren’t really draws from some probability distribution—they’re just the difference between what we observe and what our model predicts. So why does any of this work?

Planned Content

The Two Roles of Models

  • Population description: The model is the population (or describes it exactly)
  • Approximation for calibration: The model is a tool for deriving standard errors, even if it’s “wrong”

Homoskedastic vs Heteroskedastic

  • \(Y = m(X) + \sigma\epsilon\) vs \(Y = m(X) + \sigma(X)\epsilon\)
  • When does the difference matter?
  • Robust standard errors: getting calibration right without getting the model right

What the Bootstrap Is Actually Doing

  • The bootstrap doesn’t assume normality
  • It uses the sample as a model of the population
  • Connection to “the model is an approximation”

When Models Break Down

  • Heavy tails
  • Dependence
  • Small samples
  • What to watch out for

The Pragmatic View

  • Models are tools, not truths
  • “All models are wrong, but some are useful” (Box)
  • What matters: does calibration work? Is coverage ~95%?

Why This Matters for Next Semester

In regression (Semester 2), you’ll write models like: \[ Y_i = \alpha + \beta X_i + \epsilon_i \] constantly. Understanding when and why these approximations work—and when they don’t—will be essential.