The Illusion of Certainty
Why Improbable Events Happen More Than You Think and What That Means for Statistics
*This is Part 2 in a series on How Not To Be Wrong by Jordan Ellenberg*
Read Part 1 here:
Suppose you receive a letter in the mail from a stock picker who claims a certain stock will rise this week. You think nothing of it and toss the letter—only to find that the stock does, in fact, go up. The next week, you get another letter predicting a different stock will go down. It does. Over the next 10 weeks, you continue receiving letters with perfectly predicted stock movements. Finally, you get a letter asking you to invest. After all, this person has been right 10 weeks in a row! Statistically, that's incredibly unlikely. Surely they must be an expert, right?
Not so fast.
What you didn’t see is that this person sent out over 10,000 letters in Week 1—half predicting a stock will go up, the other half saying it will go down. After the results, they discard the incorrect batch and repeat the process with the remaining recipients. By Week 10, a small group has received 10 “correct” predictions in a row. Purely by design, not by insight.
The lesson? Improbable events happen frequently when the sample size is large enough. What seems statistically miraculous is often just a numbers game.
The Mutual Fund Mirage
This kind of statistical trickery isn’t limited to stock pickers—it shows up in real-world financial markets, especially in mutual funds. A company may launch a large number of funds and only publicize the ones that performed well after a few years. It creates the illusion of consistent success, when in reality, a few funds were simply lucky. People often assume past performance implies future performance, but that’s a risky assumption based on misinterpreting the role of probability and sample size.
The Problem with Significance
So how do we tell whether a result is actually meaningful? That’s where p-values come in—a statistical measure of how likely the observed results are due to random chance. For example, in pharmaceutical trials, researchers test whether a drug has a real effect or not. The null hypothesis assumes it does nothing, and statistical tests attempt to reject this assumption. If the p-value is below 0.05, the result is considered statistically significant.
But here’s the catch: a statistically significant result doesn't mean the effect is important. If a drug reduces a disease rate from 2 in 100,000 to 1 in 100,000, the p-value might be low, but the real-world impact is essentially zero. Half of a tiny number is still a tiny number.
Replication > Revelation
A low p-value doesn’t necessarily mean the effect is real. Statistically, even if there’s no true effect, 5% of studies will still report a "significant" result just by chance. That’s why replication—repeating the study and getting the same result—is so critical in science. Unfortunately, replication studies are often seen as boring and unpublishable. The scientific community tends to reward novelty over reliability, which can lead to shaky findings getting outsized attention.
To address some of the shortcomings of relying purely on p-values, researchers often turn to confidence intervals—a more informative and nuanced statistical tool.
A confidence interval gives us a range of values within which we expect the true effect or measurement to fall, with a given level of confidence (usually 95%). Instead of saying, “This drug has a statistically significant effect,” a confidence interval might tell you, “We’re 95% confident that the drug reduces recovery time by somewhere between 1 and 4 days.”
This paints a more complete picture. Not only do we get a sense of whether an effect exists, but we also understand how large or meaningful that effect might be. A very narrow confidence interval indicates a high level of precision, while a wide interval signals more uncertainty. This range-based thinking helps counteract the false sense of certainty that a single number like a p-value can give.
Confidence intervals also force us to think probabilistically, which aligns more closely with how the real world operates. We rarely know anything with 100% certainty, and confidence intervals acknowledge that truth rather than hide it. They don’t eliminate uncertainty—they quantify it. And in decision-making, that's far more powerful than a simple “yes/no” from a p-value. Whether you're evaluating a drug, a stock prediction, or a policy decision, knowing the range of possible outcomes can help you plan more realistically and make smarter choices.
Our Biases in Play
We often interpret data through the lens of what we want or expect to be true. We let prior beliefs guide our conclusions. Moreover, even if a result is significant, that doesn't mean it explains the outcome. Multiple theories can produce the same result. That’s why repeatability is the gold standard of scientific validity.
In the end, statistics doesn’t promise certainty—it helps guide us toward better decision-making. The goal isn’t to find the magic number, but to ask better questions and avoid being fooled by randomness. Let the data inform your beliefs—but don’t let your beliefs reshape the data.