how to be exact about what you don't know
A model never says “this is a cat.” It says “eighty-five percent cat.” Intelligence isn't certainty — it's handling uncertainty well. Probability is the math of reasoning when you can't be sure, and it's how every model decides what to say next.
Will it rain? Is this email spam? What's the next word in this sentence? You almost never know for sure — but “I don't know” isn't good enough to act on. You need to say how unsure: a little, or a lot.
Probability is the device for exactly that: a number from 0 to 1 measuring how strongly to believe something. Spread your belief across all the possibilities and it must add up to a whole — 100% of your confidence, divided up. That spread is called a distribution, and it's the object a neural network actually outputs.
After “The cat sat on the…”, a language model assigns every possible next word a raw score. But scores aren't beliefs. A step called softmax squashes them into probabilities that add to 100% — bigger scores get exponentially more belief.
One dial controls how boldly it commits: temperature. Turn it down and the model fixates on its favorite; turn it up and belief spreads out, and it gets surprising. Then press sample to let it actually pick.
the bars always add to 100%. low temperature = confident and repetitive; high temperature = creative and risky. sampling is how the model actually talks.
You start with a hunch — a prior. New evidence arrives. You don't throw the hunch away; you update it, landing on a sharper belief — the posterior. That rule is Bayes' theorem, and it's the most important sentence in all of statistics.
It catches us out constantly. A test for a rare disease is “99% accurate” and you test positive — yet you're probably fine, because the disease was so rare to begin with that even a good test throws more false alarms than true ones. The prior matters. Bayes is how a machine — and a careful mind — weighs new news against old odds.
In 1654 a gambler asked Blaise Pascal how to fairly split the pot of an interrupted dice game. Pascal and Pierre de Fermat traded letters working it out — and in those letters, the mathematics of chance was born. Probability started at the gambling table.
A century on, a Presbyterian minister, Thomas Bayes, wrote down how to update belief with evidence — published only after his death. Laplace turned it into a science, and in the 1930s Kolmogorov set probability on rigorous foundations. Bayes' quiet rule now underwrites modern AI.
From the last layer to the loss function, deep learning is soaked in it.
Every word a language model writes is drawn from a probability distribution like the one above. Temperature is a real knob you can turn in ChatGPT's API. You just played with the engine of generation.
Models train by minimizing “cross-entropy” — a measure of how surprised the model was by the right answer. Less surprise, better model. That's information theory, built on probability.
Insurance, the “30% chance of rain,” medical test results, spam filters, A/B tests, and every poll you've ever read — all are Bayes and distributions at work.
Hover or tap each piece.
{{ termBody }}
That's Bayes' theorem, and it reads like plain advice: your updated belief is your old belief, reweighted by how well it explains what you just saw. Hold your priors, but let the evidence move them. It is the whole art of thinking clearly — and of building a machine that does too.