Why Big Numbers Reveal Big Truths: The Law of Large Numbers

How a Few Thousand Voices Can Speak for Millions

Sep 21, 2025

This is The Curious Mind, by Álvaro Muñiz: a newsletter where you will learn about technical topics in an easy way, from decision-making to personal finance.

Traducir al español

Imagine this: it's election night, and pollsters declare a winner based on polling just 2,000 voters out of several million. Sounds impossible, right? Yet these predictions are close to the results with remarkable accuracy.

The secret isn't dark magic or crystal balls—it's one of statistics' most elegant principles at work. The Law of Large Numbers explains why small, carefully chosen samples can reveal profound truths about massive populations. It's the mathematical foundation that makes opinion polls, medical trials, and quality control possible.

Today, we'll explore this statistical wonder and discover why "going big" with numbers is our most reliable path to truth.

Poll of Spanish election’s results on July 17th (left) vs actual results on July 23rd (right). The predicted numbers show striking similarity. Sources: left, right.

The Mystery: How Do Pollsters Do It?

Let’s start with a concrete example

Suppose you want to know what percentage of the population in Spain will vote for PSOE in the next election. That percentage exists as a fixed number somewhere out there—if we could magically ask every single Spanish voter, you could calculate it perfectly. The problem? You can’t feasibly ask 37 million people.

Instead, you do something that seems almost too good to be true: you ask a few thousand people, calculate their percentage, and declare that this represents the entire country.

This extrapolation from sample to population turns out to be not wild at all—it's precisely what the Law of Large Numbers guarantees will work.

The Law of Large Numbers: Your Statistical Superpower

Here's the formal definition, then we'll break it down:

The Law of Large Numbers
The average of a sample of independent, identically distributed random variables converges to the population average as the sample size increases.

Sounds intimidating? It's actually describing something really simple: poll enough people the right way, and you'll get remarkably close to the truth.

Let's unpack each part:

Random Variables: When Certainty Meets Chance

A random variable is just the name mathematicians use for "a quantity that varies randomly."

Here's the key insight: whether María specifically votes for PSOE isn't random—she knows her preference. But if I pick someone at random from Spain's population, their preference is random from my perspective. I genuinely don't know what I'll discover until I ask.

Independence: Why Your Friends Can Ruin Everything

Here's where things get tricky, and why many polls fail.

Imagine you're estimating the support for PSOE, so you poll someone and also their partner. Problem: couples often share political views. If María votes PSOE, Carlos probably does too.1

This creates "dependent" responses—they carry overlapping information rather than independent evidence. It's like asking the same person twice and pretending you got two opinions.

With strong dependence, the Law of Large Numbers can break down entirely. Positive correlations make your sample mean jump around more, requiring much larger samples to stabilize.

Identically Distributed: Comparing Apples to Apples

This is just a statistician’s way of saying that you should "make fair comparisons."

Want to know the average height of Spanish men? Sample Spanish men and measure their height—not Greek men, not Spanish women, and don’t measure their weight instead.

Most importantly, your sample must be representative: measure only people in Galicia, and you're measuring Galicia’s average height, not all of Spain's.

Convergence: Getting Arbitrarily Close to Truth

Remember when we learned that 0.9999… = 1? The sequence 0.9, 0.99, 0.999… gets as close to 1 as you want.

The same happens with polling averages. "Convergence" means:

Poll enough people, and you can get as close to the true answer as you want.

The LLN in Action: A Real Simulation

Let me show you the Law of Large Numbers in action with a real simulation.

Imagine we somehow know that exactly 32% of Spain supports PSOE (roughly their 2023 election result). Now let's watch what happens as we poll 1, 2, 3… up to 8,000 random people:

As you can see in the figure, at the beginning our estimate is very poor.

After polling just one person (who happened not to support PSOE), our estimate is 0%—obviously terrible.

As we poll more (but still few) people, our estimate of the support for PSOE goes up and down erratically, reaching almost 40% in early stages and dropping below 30% with around 1,000 people polled.

The crucial thing is that, as the number of polled people increases, you can see how our estimate of the support for PSOE stabilises around the true value. This means two things:

It doesn’t vary much.
It stays close to the true value.

After about 2,000 people polled, the sample average stays within a 1% range of the real value of 32%.

If you run another poll with 8,000 new people, you will get a different graph, yet the phenomenon will be the same: as we poll more and more people, the sample average stabilises and approaches the true mean.

*In this particular run the first person polled **was** a PSOE voter, so we start with an estimate of 100%. As the number of people in our poll increases, we get closer and closer to the true value.*

Why This Changes Everything

The Law of Large Numbers isn't just mathematical theory—it's the invisible foundation of our data-driven world.

It explains why pharmaceutical companies can test drugs on thousands of people and confidently predict effects on millions. Why manufacturers can inspect a few hundred products and guarantee quality across entire production runs. Why Netflix can recommend movies based on user patterns, and why your bank can assess credit risk.

The next time you ponder about the power of a poll, remember that mathematics guarantees that asking enough people reveals the collective truth.

Thanks for reading The Curious Mind! If you like this post, hit the ❤️ button below and share with your friends

In Case You Missed It

When Time Slows Down: Einstein's Special Relativity

Álvaro Muñiz Brea

Apr 20

Read full story

The Smartest Way to Bet: Understanding Kelly Criterion

Álvaro Muñiz Brea

Mar 23

Read full story

The Impossible Race: How a Tortoise Challenged Mathematics for 2,500 Years

Álvaro Muñiz Brea

Mar 9

Read full story

The Spanish Christmas Lottery: Dreams, Regrets, and the Reality of Your Odds

Álvaro Muñiz Brea

December 29, 2024

Read full story

Remember: dependence doesn't mean couples always vote identically. It means that knowing María's choice changes the probability of Carlos's choice. María’s vote gives us information about Carlos’ likely preferences.

Laura

Oct 6

It's amazing in how many fields this rule can be used!

Expand full comment

1 reply by Álvaro Muñiz Brea

Emuvi

Sep 21

Para depurar las encuestas, y evitar que por ejemplo la comentada "dependencia" la eche a perder, está lo que coloquialmente se llama "cocinar los datos": preguntas adicionales que se utilizan para depurar los datos.

Si por ejemplo preguntamos ¿A qué partido votó en las anteriores elecciones? y el 40% responde que al PSOE, está claro que hay que hacer una corrección, ya que el % de votos reales obtenidos fue inferior a ese 40%

2 more comments...

The Curious Mind

When Time Slows Down: Einstein's Special Relativity

The Smartest Way to Bet: Understanding Kelly Criterion

The Impossible Race: How a Tortoise Challenged Mathematics for 2,500 Years

The Spanish Christmas Lottery: Dreams, Regrets, and the Reality of Your Odds

Discussion about this post

Ready for more?