One of the most famous theorems in statistics and probability is that of Bayes’ Theorem, which first appeared around 250 years ago. It allows us to calculate reverse probabilities and use new evidence to update our beliefs. For example the probability of a hypothesis given a set of evidence can be found from the probability of that evidence given a hypothesis.
To understand Bayes’ Theorem it is important to have a basic understanding of conditional probability. This is the probability of something happening given that some event has already happened. Some examples of conditional probabilities are given below,
- Given that Watford scored a goal, what was the probability that Odion Ighalo scored?
- Given that it rained yesterday, what is the probability that it will remain tomorrow?
- Given a sports centre has a swimming pool, what is the probability it also has a gym?
Bayes’ famous theorem related the conditional probabilities of two events A and B together.
- P(A|B) is the probability of event A happening given that event B has happened
- P(B|A) is the probability of event B happening given that event A has happened
- P(A) is the probability of event A happening
- P(B) is the probability of event B happening
Whilst this might appear to be a relatively simple idea conditional probabilities are often misunderstood. The probability that I have an umbrella when it is raining is not the same as the probability that it is raining when I have an umbrella.
A Medical Example
To illustrate this I will describe the classic example of testing people for a disease. You might find the results quite surprising! Let event A be the event that you have disease and event B be the event that you test positive for the disease.
- P(A) = probability you have the disease
- P(not A) = probability that you don’t have the disease = 1 – P(A)
- P(B) = probability that you test positive
- P(A|B) = probability that you have the disease, given you test positive
- P(B|A) = probability that you test positive, given you have the disease
- P(B| not A) = probability that you test positive, given you don’t have the disease
Generally medical tests might find the disease 95% of the time when you have the disease, so P(B|A) = 0.95. Then if we assume that 5% of the time people who are are healthy also get a positive result, we have P(B| not A) =0.05. Finally if the probability of having the disease in the first place is 1 in 1000, we get that P(A)=0.001 and P(not A)=0.999.
First we need to find the probability of a positive test by conditioning on whether we do or do not have the disease.
We can then use the fact that the probability of a positive test is about 5.1% in the denominator of Bayes Theorem below.
This result of 0.0187 is 18.7%, so we see that the probability of having the disease given that you’ve tested positive is actually 18.7%, much lower than most people’s original guess of 95%!