Deep Learning Math  ·  Chapter 05

Matrices

how to move all of space at once

A grid of numbers looks like the most boring thing in math. It isn't. A matrix is a machine that grabs space and moves it — rotating, stretching, shearing every point at once. A neural network is mostly this, done over and over.

SCROLL
The problem · moving everything together

How do you rotate a whole shape — every point of it — with one rule?

In Chapter 4, a vector was a single arrow. But a picture, a 3D model, or a layer of a neural network is millions of arrows. You don't want to move them one by one — you want a single instruction that moves all of space consistently: straight lines stay straight, the grid stays evenly spaced, the origin stays put. That kind of motion is called a linear transformation.

A matrix is just the compact recipe for one. And there's a shockingly simple way to capture it: you only have to say where two little arrows go.

The idea · follow two arrows

Tell me where these two land, and I know where everything lands.

Start with two unit arrows: one pointing right, one pointing up. Every other point in space is built from copies of these two. So if you decide where the right-arrow lands and where the up-arrow lands, the fate of every other point is already sealed — it just rides along.

Those two landing spots, written as columns, are the matrix. Grab the sliders and bend space yourself — or hit a preset and watch it snap into a rotation, a stretch, a shear.

the matrix
{{ aDisp }}{{ bDisp }} {{ cDisp }}{{ dDisp }}
area × {{ detDisp }} {{ flipNote }}
→x
↑x
→y
↑y

blue is where → lands, green is where ↑ lands. the purple patch is the unit square, dragged along — its area is the determinant.

Two things the picture tells you

The grid hides two superpowers.

First, that area number — the determinant — tells you how much the transformation stretches or squishes space. If it hits zero, space has been crushed flat onto a line, and the move can't be undone. If it goes negative, space has been flipped inside out, like a mirror.

Second, you can chain transformations. Do one, then another, and the combined move is itself a matrix — found by “multiplying” the two. That's all matrix multiplication really is: do this transformation, then that one. A deep network is a long stack of these, one per layer.

Who built this

Solving for the unknowns — for two thousand years.

manuscript:
Nine Chapters
~200 BCE

Long before the word existed, Chinese mathematicians in the Nine Chapters on the Mathematical Art were laying numbers in grids and shuffling rows to solve systems of equations — essentially Gaussian elimination, two millennia early.

portrait:
Arthur Cayley
~1858

The name matrix was coined by James Sylvester in 1850; his friend Arthur Cayley then worked out the algebra — how to add and, crucially, multiply them — turning a bookkeeping grid into the language of transformation that runs every GPU today.

Where you'll meet it again

A neural network is a tower of matrices.

Strip away the buzzwords and most of deep learning is matrix multiplication, at colossal scale.

Every layer

A network layer takes the vector of activations and multiplies it by a matrix of learned weights — transforming the data into a new space where the next question is easier to answer. Training is the search for the right matrices.

Why GPUs matter

GPUs exist to multiply huge matrices fast. The entire AI hardware boom is, underneath, a race to do this one operation more quickly.

In the real world

3D games and film (every camera move is a matrix), image filters, Google's original PageRank, economics, and the data-squeezing of PCA all run on them.

Now the symbols can't scare you

A grid, and what it does to an arrow.

Hover or tap each piece.

abcd xy = ax + bycx + dy
{{ termTitle }}

{{ termBody }}

Look closely at the answer: each row is a dot product — the very thing from Chapter 4. A matrix times a vector is just a stack of dot products. Every idea in this book is quietly holding hands with the next.