24 Least Squares in R

Setup

Least Squares in Linear Models

Sample
Population

Least Squares
Linear Models

The estimation targets we’ve talked about are all summaries of column means in the population.

\[ \begin{aligned} \mu(w,x) &= \frac{1}{m_{w,x}}\sum_{j:w_j=w,x_j=x} y_j \qfor m_{w,x} = \sum_{j: w_j=w, x_j=x} 1 \\ &= \mathop{\mathrm{E}}[Y_i \mid W_i=w, X_i=x] \qfor (W_i,X_i,Y_i) \qqtext{sampled uniformly-at-random} \end{aligned} \]

Least Squares Regression is a method for choosing a function to estimate these column means.
- When we choose from the set of all functions of $(w,x)$ …
- … we get the mean in the corresponding column of the sample.

\[ \begin{aligned} \hat\mu(w,x) &= \mathop{\mathrm{argmin}}_{m \in \mathcal{M}} \sum_{i=1}^n \qty{Y_i - m(W_i,X_i)}^2 \\ &= \frac{1}{N_{w,x}}\sum_{i:W_i=w,X_i=x} Y_i \qqtext{ when } \mathcal{M}= \{ \text{all functions of } (w,x) \} \end{aligned} \]

The set of functions we choose from is our model.
- Or regression model if we want to be verbose.
- And there are, of course, many other models.
When we don’t have a lot of data in some columns in the sample …
- Column means are really sensitive to who we happened to sample.
- And we’re probably better off using a smaller model, i.e., one that includes fewer functions.
The models we’ll focus on in this class are linear models.
- They’re closed under addition, subtraction, and scalar multiplication.
- That means when you add, subtract, or scale any functions in the model …
- … you get another function in the model.

Least Squares with Univariate Data

Models

\[ \begin{aligned} \mathcal{M}&= \{ \text{all functions } \ m(x) \} && \text{all functions} \\ \mathcal{M}&= \{ \text{all functions } \ m(c(x)) \} && \text{all functions of $c(x)$, a coarsened version of $x$} \\ \mathcal{M}&= \{ m(x) = a + bx \ : \ a,b \in \mathbb{R} \} && \text{all lines} \end{aligned} \]

We use R’s built-in function lm¹ to do least squares with linear models.
To tell it which model to use, we use what R calls a ‘formula’.
- What a formula does is describe a basis for the model.
- i.e. a set of functions that we can take linear combinations of to get any function in the model. \[ \begin{aligned} \mathcal{M}&= \{ m(x) = a_0 b_0(x) + a_1 b_1(x) + \ldots + a_k b_k(x) \ \ : \ a_0 \ldots a_k \in \mathbb{R} \} \\ & \qif b_0(x),b_1(x),\ldots,b_k(x) \ \text{ is a basis for } \ \mathcal{M}. \end{aligned} \]

Where it says lm(y~1+x, data=unisam) above, y~1+x is the formula and unisam is the data.
- It’s saying that we want to choose a function $m$ from our model for which $m(X_i) \approx Y_i$ …
- … where $(X_i,Y_i)$ are unisam$x[i] and unisam$y[i] for $i=1 \ldots n$.
- And our model is the set of all linear combinations of the functions $b_0(x) = 1$ and $b_1(x) = x$. \[ \hat \mu = \mathop{\mathrm{argmin}}_{m \in \mathcal{M}} \sum_{i=1}^n \qty{Y_i - m(X_i)}^2 \qfor \mathcal{M}= \{m(x) = a_0 \ 1 + a_1 \ x \ \ : \ \ a_0,a_1 \in \mathbb{R} \} \]

What lm gives you is not really the function $\hat\mu$.
- It’s something else. But whatever it is, you can calculate $\hat\mu(x)$ by calling predict.
- The code below uses predict to calculate $\hat\mu(8)$, $\hat\mu(10)$, and $\hat\mu(12)$.

I like to write a function muhat that calls predict on its argument.
- This makes the code shorter and more like the mathematical notation.

If you’re familiar with object oriented programming …
- lm returns an object we call fitted.model.
- predict is one of its methods.
Method calls in R look a little different than they do in Python, C++, or Java.
- But they work pretty much the same way. Almost.
- R, like Common Lisp and Julia, has an object system based on multimethods.

Formulas in Abstract

You can use any R functions you like in a formula.
- Above, I’ve implemented my own functions b.0 and b.1 to use as my basis.
- You can also use R’s built-in functions. Try replacing b.1 in the formula with sin.
You can use as many basis functions as you like.
- Try adding b.2(x) to the formula y ~ b.0(x) + b.1(x) above.

You don’t really have to define all these functions.
- $1$ and $x$ will do in place of b.0(x) and b.1(x).
But x^2 won’t do in place of b.2(x).
- It’s because squaring means something special in formulas.
- But you can work around this by writing I(x^2) instead.
- In short, wrapping an expression in I says ‘interpret this the normal way.’
Try writing out a formula that does the same thing as y ~ b.0(x) + b.1(x) + b.2(x) …
- … but without using b.0, b.1, or b.2.

R implicitly adds the constant function $b_0(x)=1$ to every basis you give it.
- Go ahead and get rid of b.0(x) in the formula y~b.0(x)+b.1(x) above.
- Does anything change in the plot?
If you want to stop it from doing this, you can write y ~ 0 + b.1(x) instead.
- What changes in the plot? Why?

The All-Functions Model

Piecewise-Constant Models

Least Squares with Multivariate Data

‘lm’ stands for ‘linear model’↩︎