24 Least Squares in R
Setup
$$
Least Squares in Linear Models
- The estimation targets we’ve talked about are all summaries of column means in the population.
\[ \begin{aligned} \mu(w,x) &= \frac{1}{m_{w,x}}\sum_{j:w_j=w,x_j=x} y_j \qfor m_{w,x} = \sum_{j: w_j=w, x_j=x} 1 \\ &= \mathop{\mathrm{E}}[Y_i \mid W_i=w, X_i=x] \qfor (W_i,X_i,Y_i) \qqtext{sampled uniformly-at-random} \end{aligned} \]
- Least Squares Regression is a method for choosing a function to estimate these column means.
- When we choose from the set of all functions of \((w,x)\) …
- … we get the mean in the corresponding column of the sample.
\[ \begin{aligned} \hat\mu(w,x) &= \mathop{\mathrm{argmin}}_{m \in \mathcal{M}} \sum_{i=1}^n \qty{Y_i - m(W_i,X_i)}^2 \\ &= \frac{1}{N_{w,x}}\sum_{i:W_i=w,X_i=x} Y_i \qqtext{ when } \mathcal{M}= \{ \text{all functions of } (w,x) \} \end{aligned} \]
- The set of functions we choose from is our model.
- Or regression model if we want to be verbose.
- And there are, of course, many other models.
- When we don’t have a lot of data in some columns in the sample …
- Column means are really sensitive to who we happened to sample.
- And we’re probably better off using a smaller model, i.e., one that includes fewer functions.
- The models we’ll focus on in this class are linear models.
- They’re closed under addition, subtraction, and scalar multiplication.
- That means when you add, subtract, or scale any functions in the model …
- … you get another function in the model.
Least Squares with Univariate Data
Models
\[ \begin{aligned} \mathcal{M}&= \{ \text{all functions } \ m(x) \} && \text{all functions} \\ \mathcal{M}&= \{ \text{all functions } \ m(c(x)) \} && \text{all functions of $c(x)$, a coarsened version of $x$} \\ \mathcal{M}&= \{ m(x) = a + bx \ : \ a,b \in \mathbb{R} \} && \text{all lines} \end{aligned} \]
- We use R’s built-in function
lm
1 to do least squares with linear models. - To tell it which model to use, we use what R calls a ‘formula’.
- What a formula does is describe a basis for the model.
- i.e. a set of functions that we can take linear combinations of to get any function in the model. \[ \begin{aligned} \mathcal{M}&= \{ m(x) = a_0 b_0(x) + a_1 b_1(x) + \ldots + a_k b_k(x) \ \ : \ a_0 \ldots a_k \in \mathbb{R} \} \\ & \qif b_0(x),b_1(x),\ldots,b_k(x) \ \text{ is a basis for } \ \mathcal{M}. \end{aligned} \]
- Where it says
lm(y~1+x, data=unisam)
above,y~1+x
is the formula andunisam
is the data.- It’s saying that we want to choose a function \(m\) from our model for which \(m(X_i) \approx Y_i\) …
- … where \((X_i,Y_i)\) are
unisam$x[i]
andunisam$y[i]
for \(i=1 \ldots n\). - And our model is the set of all linear combinations of the functions \(b_0(x) = 1\) and \(b_1(x) = x\). \[ \hat \mu = \mathop{\mathrm{argmin}}_{m \in \mathcal{M}} \sum_{i=1}^n \qty{Y_i - m(X_i)}^2 \qfor \mathcal{M}= \{m(x) = a_0 \ 1 + a_1 \ x \ \ : \ \ a_0,a_1 \in \mathbb{R} \} \]
- What
lm
gives you is not really the function \(\hat\mu\).- It’s something else. But whatever it is, you can calculate \(\hat\mu(x)\) by calling
predict
. - The code below uses
predict
to calculate \(\hat\mu(8)\), \(\hat\mu(10)\), and \(\hat\mu(12)\).
- It’s something else. But whatever it is, you can calculate \(\hat\mu(x)\) by calling
- I like to write a function
muhat
that callspredict
on its argument.- This makes the code shorter and more like the mathematical notation.
- If you’re familiar with object oriented programming …
lm
returns an object we callfitted.model
.predict
is one of its methods.
- Method calls in R look a little different than they do in Python, C++, or Java.
- But they work pretty much the same way. Almost.
- R, like Common Lisp and Julia, has an object system based on multimethods.
Formulas in Abstract
- You can use any R functions you like in a formula.
- Above, I’ve implemented my own functions
b.0
andb.1
to use as my basis. - You can also use R’s built-in functions. Try replacing
b.1
in the formula withsin
.
- Above, I’ve implemented my own functions
- You can use as many basis functions as you like.
- Try adding
b.2(x)
to the formulay ~ b.0(x) + b.1(x)
above.
- Try adding
- You don’t really have to define all these functions.
- \(1\) and \(x\) will do in place of
b.0(x)
andb.1(x)
.
- \(1\) and \(x\) will do in place of
- But
x^2
won’t do in place ofb.2(x)
.- It’s because squaring means something special in formulas.
- But you can work around this by writing
I(x^2)
instead.
- In short, wrapping an expression in
I
says ‘interpret this the normal way.’
- Try writing out a formula that does the same thing as
y ~ b.0(x) + b.1(x) + b.2(x)
…- … but without using
b.0
,b.1
, orb.2
.
- … but without using
- R implicitly adds the constant function \(b_0(x)=1\) to every basis you give it.
- Go ahead and get rid of
b.0(x)
in the formulay~b.0(x)+b.1(x)
above. - Does anything change in the plot?
- Go ahead and get rid of
- If you want to stop it from doing this, you can write
y ~ 0 + b.1(x)
instead.- What changes in the plot? Why?
The All-Functions Model
Piecewise-Constant Models
Least Squares with Multivariate Data
‘lm’ stands for ‘linear model’↩︎