Linear Regression as Projections

Arts

Books

The Zen and the Art of Motorcycle Maintenance

C++

CFA

Competitive Probability

DSA

Graphs
Trie

Dark Arts

Linear Algebra

Polish

Portfolio Management

Mean Variance Portfolio Optimization

Quant Finance

Rust

Spells

Rsync

You start with a target vector $y \in \mathbb{R}^n$. The outcomes observed. Then you’ve got a design matrix $X \in \mathbb{R}^{n \times p}$. Each column of $X$ is a predictor, and the span of those columns is a $p$-dimensional Subspace of $\mathbb{R}^n$.

Imagine $y$ as a point floating in 3D space, and $X$ as a tilted 2D plane. Regression doesn’t invent a new direction; it just drops $y$ straight down onto the plane. In higher dimensions, same story — only now the plane is a $p$-dimensional slab inside $\mathbb{R}^n$.

You’ve got $y \in \mathbb{R}^3$, a point floating in space. You’ve got $X$, whose columns span a tilted 2D plane slicing through the origin. The regression problem is: where does $y$ land if you drop it straight down onto that plane? That landing spot is $\hat{y} = X\hat{\beta}$. It’s the orthogonal projection of $y$ onto the column space of $X$. The residual $r = y - \hat{y}$ is the vertical drop — and it’s perpendicular to the plane.

So the question linear regression asks is brutally geometric:

Among all possible linear combinations of the columns of $X$, which one is nearest to $y$?

Formally: find $\hat{y} = X\hat{\beta}$ such that
$$
|y - \hat{y}|_2
$$
is minimized. That’s just saying: project $y$ orthogonally onto the subspace $\text{Col}(X)$.

Now, what does it mean to be a projection?
If $\hat{y}$ is the projection, then the residual $r = y - \hat{y}$ must be orthogonal to the subspace.
Orthogonal to the subspace means orthogonal to each column of $X$.
* Which gives the normal equations:
$$
X^\top (y - X\hat{\beta}) = 0.
$$
That condition pins down $\hat{\beta}$. Solving it yields the familiar formula:
$$
\hat{\beta} = (X^\top X)^{-1} X^\top y.
$$
So, the coefficients $\hat{\beta}$ are nothing mystical — they’re just the coordinates of the projection of $y$ onto the span of the predictors.

The whole operation can be packaged into a single matrix:
$$
P = X (X^\top X)^{-1} X^\top.
$$
This is the projection matrix. It’s symmetric, idempotent ($P^2 = P$), and it maps any $y$ to its fitted values:
$$
\hat{y} = Py.
$$

That’s linear regression boiled down: every time you “fit a line,” what you’re actually doing is multiplying $y$ by $P$ and dropping it onto a lower-dimensional plane carved out by the predictors.

Linear Map Matrix Multiplication

On This Page