There are two main strands of statistics, Classical statistics and Bayesian statistics. These aren’t necessarily conflicting ideologies, though many statisticians throughout history would beg to differ, but are simply two different ways to tackle a problem. Hopefully this post will give you some brief insight into the uses and differences of the two approaches.
Classical statistics is the first type of statistics that people come across and is to do with what we expect to happen in a repeatable experiment. This might be the idea that if we flip a coin an infinite number of times the proportion of heads we obtain is a half. Hence we get the well known probability of a head as a 1/2. This is why classical statistics is often known as frequentist and covers ideas such as confidence intervals and p-values.
- P(A), P(B) are known as prior probabilities, because we know them before we learn any more information.
- P(A|C), P(B|C) are known as posterior probabilities, because they are found after we have learnt some additional information.
You can think of this Bayesian statistics as an evolution of Bayes’ Theorem. Instead of dealing with point probabilities we now deal with probability distributions. As a result we now have prior and posterior distributions to consider.
As the term is just a normalising constant we can drop it to get the commonly seen Bayes’ Rule.
Here is the posterior distribution, is the likelihood which accounts for the statistical model and is the prior which represents the expert beliefs before seeing the data. The key point is that the Bayesian approach can quantify theories and hypotheses, something that can be desirable.
Frequentist vs Bayesian
The comic above talks about an object that can measure neutrinos. Neutrinos are sub-atomic particles which can travel great distances and pass through matter unaffected. The detector is set up to tell lies or the truth regardless of whether neutrinos coming from a supernova were detected.
The frequentist statistician is making a mistake in his analysis. Normally most studies work on the assumption that something can be deemed to be true if there is a less than 5% chance that the result itself was down to chance. This is better known as the need to disprove the null hypothesis.
As the chance of the two die both coming up as 6’s is 1/36 (about 0.028) which is less than the 5% threshold, the frequentist decides that the detector must be telling the truth. The bayesian gives a different answer: that there is not a supernova. This will turn out to be a more appropriate answer. Because a a supernova isn’t really a repeatable event, we should use our beliefs about them to help find an answer!
Most models of the lifetime of the Sun say that it can be expected to live for around 10 billion years. If the experiment is run once an hour every day (24 times), we get the probability of the sun exploding as 1 in 8760 billion or 1 in . Then using the information in the cartoon there are two possibilities when the detector reports an explosion.
- The sun has exploded (one in 8.76×1013) and the detector is telling the truth (35 in 36). This event has a total probability of about 1/(8.76×1013) × 35/36 or about 1/9.01×1013.
- The sun hasn’t exploded (8.76×1013 − 1 in 8.76×1013) and the detector is not telling the truth (1 in 36). This event has a total probability of about (8.76×1013 − 1) / 8.76×1013 × 1/36 or about 1/36.
This makes it pretty clear that the Sun isn’t likely to have exploded.
Things to Note
Clearly if the Sun had happened to have gone supernova, the Earth will be wiped out pretty quickly. This would make the money lost in the bet would be worth absolutely nothing, highlighting the need to check your working.
Finally the physicist inside me feels it necessary to mention that the Sun can’t go supernova: it is not nearly massive enough. So perhaps the true meaning of the comic is that physics is more important …