Bayes Theory Mathematics

Mathematics Statistics Philosophy

I have been reading a book called “The Theory That Would Not Die” by Sharon Bertsch McGrayne.

It is an extraordinary, detailed look at Bayes’ Theorem, and I have learned a lot more about the thinking behind the mathematics we applied in the automated CV processing and coding systems I designed for Hudson Highland and the FiveTen Group.

Link to blog posts [1] [2] [Smoogle]

When working with the data-cubes, Bayesian algorithm, Monte Carlo and Markov Chain analysis for these projects, I did not know that Bayes had not written the mathematical formulation of his rule; that was done separately by the mathematician Pierre-Simon Laplace. So, I think Laplace deserves a lot of credit for this theory, but Bayes only gets naming rights because he was first. You learn something every day!

Anyway, I digress…

Imagine this: You wake up one morning and you feel sick. No particular symptoms, just not 100%. So you go to the doctor and she also doesn’t know what’s going on with you, so she suggests they run a battery of tests. After a week goes by, the results come back. Turns out you tested positive for a very rare disease that affects about 0.1% of the population. And it’s a nasty disease, horrible consequences. You don’t want it. So you ask the doctor: you know how certain is it I have this disease?

The doctor says the test will correctly identify 99% of people who had the disease and only incorrectly identify one percent of people who don’t have the disease.

That sounds pretty bad. What are the chances that you actually have this disease? Most people would say 99% because that’s the accuracy of the test, but that is not actually correct. You need Bayes’ theorem to get some perspective on the question.

Bayes theorem can give you the probability that some hypothesis, say that you actually have the disease, is true given an event that you tested positive for the disease.

To calculate this, you need to take the prior probability the hypothesis was true, how likely you thought it was that you have this disease before you got the test results – and multiply it by the probability of the event given the hypothesis is true, the probability that you would test positive if you had the disease. Then, divide that by the total probability of the event occurring, testing positive.

This term is a combination of your probability of having the disease and correctly testing positive, plus your probability of not having the disease and being falsely identified. The prior probability that a hypothesis is true is often the hardest part of this equation to figure out, and sometimes it’s no better than a guess. But in this case, a reasonable starting point is the frequency of the disease in the population, let’s say 0.1%. When you plug in the rest of the numbers, you find that you have a nine percent chance of actually having the disease after testing positive – which is incredulously low.

Bayes’ theorem states that the probability of a hypothesis being true is equal to the probability of an event occurring, given the hypothesis is true, multiplied by the prior probability of the hypothesis being true.

The prior probability of the hypothesis being true is the probability of the hypothesis being correct before any evidence is considered. The probability of the event occurring is a combination of the probability of the event occurring, given the hypothesis is true, and the probability of the event not occurring, given the hypothesis is false.

This isn’t some sort of crazy magic. It’s actually common sense applied to mathematics.

Just think about a sample size of a thousand people. One person out of that thousand is likely to actually have the disease and the test would likely identify them correctly as having the disease. Which means that out of the 999 other people, one percent or they would falsely identify ten people as having the disease.

Therefore, if you’re one of those people who has a positive test result and everyone’s just selected at random, you’re actually part of a group of eleven where only one person has the disease. So your chances of actually having it are one in eleven nine percent.

When Bayes first came up with this theorem, he didn’t actually think it was revolutionary. He didn’t even think it was worthy of publication. He didn’t submit it to the Royal Society, of which he was a member. And in fact, they discovered it in his papers after he died and he had abandoned it for more than a decade. His family asked his friend Richard Price to dig through his papers and see if there was anything worth publishing. And that’s where Price discovered the origins of Bayes’ theorem.

Originally, Bayes considered a thought experiment where he was sitting with his back to a perfectly flat, perfectly square table. He would ask an assistant to throw a ball onto the table, and the ball could land anywhere on the table. The idea was to figure out where it landed, so he would ask his assistant to throw on another ball, and then tell him if it landed to the left or to the right or in front behind of the first ball. He would note that down and then ask for more and more balls to be thrown on the table. What he realized was that, through this method, he could keep updating his idea of where the first ball was.

He would never be completely certain, but with each new piece of evidence, he would get more and more accurate. This is how Bayes saw the world—not that he thought the world was not determined or that reality didn’t quite exist, but it was that we couldn’t know it perfectly. All we could hope to do was update our understanding as more and more evidence became available.

When Richard Price introduced Bayes’ theorem, he made an analogy to a man coming out of a cave:

Maybe he’d lived his whole life in there and he saw the Sun rise for the first time and kind of thought to himself, is this a one-off? Is this a quirk or does the Sun always do this? And then every day after that as the Sun rose again, he could get a little more confident that, well, that was the way the world works.

So Bayes’ theorem wasn’t really a formula intended to be used just once. It was used multiple times, each time gaining additional evidence and updating your probability that something is true.

If we go back to the first example, when you tested positive for a disease, what would happen if you went to another doctor, got a second opinion, and had that test run again? But let’s say by a different lab, just to be sure that those tests are independent. And let’s say that the test also comes back as positive. What is the new probability that you actually have the disease?

You can use Bayes’ formula again, except this time the prior probability that you have the disease is the posterior probability from before, which was 9%.

So, if you crunch the numbers, the new probability that you have the disease, based on two positive tests, is 91%.

This makes sense because two positive results from different labs are unlikely to just be a chance. But you’ll notice that the probability is still not as high as the reported accuracy of the test.

Bayes’ theorem has found several practical applications, including spam filters. Traditional spam filters actually don’t do a wonderful job because there are too many false positives—too much of your email ends up in spam. But using a Bayesian filter, you can look at the various words that appear in emails and use Bayes’ theorem to give a probability that the email is spam, given that those words appear.

Bayes theorem tells us how to update our beliefs considering additional evidence, but it can’t tell us how to set our prior beliefs. So it’s possible for some people to hold that certain things are true with a hundred percent certainty and other people to hold that those same things are true with zero percent certainty.

What Bayes’ theorem shows us is that in those cases, there is absolutely no evidence that anyone could use to change their minds.

So, as Nate silver points out in his book “The Signal and the Noise”, we should probably not have debates between people with a hundred percent prior certainty and zero percent prior certainty, because they’ll never convince each other of anything.

Mostly when people talk about Bayes’ theorem, they discuss how counterintuitive it is and how we don’t really have an inbuilt sense of it.

But maybe we’re too good at internalizing the thinking behind Bayes’ theorem. And the reason I’m worried about that is that I think in life we can get used to particular circumstances. People get used to results like getting rejected or failing at something, or getting paid a low wage. We internalize that as though we are that man emerging from the cave who sees the sun rise every day. We keep updating our beliefs to a point of near certainty that we think that this is basically how nature is. It’s the way the world is and there’s nothing we can do to change it.

Nelson Mandela said that “Everything is impossible until it’s done” and I think that is a kind of very Bayesian viewpoint on the world.

The thing we forget in Bayes’ theorem is that our actions play a role in determining outcomes and determining how true things actually are. But if we internalize that something is true and maybe we’re a hundred percent sure that it’s true and there’s nothing we can do to change it, well then we’re going to keep on doing the same thing and we’re going to keep on getting the same result. It’s a self-fulfilling prophecy.

A superb understanding of Bayes’ theorem implies that experimentation is essential. If you’ve been doing the same thing for a long time and getting the same result that you’re not necessarily happy with, maybe it’s time to change.