# A Brief Look at Bayesian Statistics

There are two main strands of statistics, Classical statistics and Bayesian statistics. These aren’t necessarily conflicting ideologies, though many statisticians throughout history would beg to differ, but are simply two different ways to tackle a problem. Hopefully this post will give you some brief insight into the uses and differences of the two approaches.

Classical statistics is the first type of statistics that people come across and is to do with what we expect to happen in a repeatable experiment. This might be the idea that if we flip a coin an infinite number of times the proportion of heads we obtain is a half. Hence we get the well known probability of a head as a 1/2. This is why classical statistics is often known as frequentist and covers ideas such as confidence intervals and p-values.

Bayesian statistics evolved out of Bayes’ Theorem which I talked about in a previous post.

$P(A|B) = \cfrac{P(B|A)P(A)}{P(B)}$

• P(A), P(B) are known as prior probabilities, because we know them before we learn any more information.
• P(A|C), P(B|C) are known as posterior probabilities, because they are found after we have learnt some additional information.

You can think of this Bayesian statistics as an evolution of Bayes’ Theorem. Instead of dealing with point probabilities we now deal with probability distributions. As a result we now have prior and posterior distributions to consider.

$f(\theta|x) = \cfrac{f(x|\theta)f(\theta)}{f(x)}$

As the term $f(x)$ is just a normalising constant we can drop it to get the commonly seen Bayes’ Rule.

$f(\theta|x) \propto f(x|\theta)f(\theta)$

Here $f(\theta|x)$ is the posterior distribution, $f(x|\theta)$ is the likelihood which accounts for the statistical model and $f(\theta)$ is the prior which represents the expert beliefs before seeing the data. The key point is that the Bayesian approach can quantify theories and hypotheses, something that can be desirable.

# Coupon Collecting and Football Stickers

Although Euro 2016 is still months away from officially starting, another footballing tradition has just started: collecting football stickers. For children and adults alike the quest to complete the entire sticker album is something that takes time and money but the excitement and joy makes it worth it.

As the number of stickers in each album have dramatically increased from the first few albums produced around 40 years ago, the task of collecting all of the stickers has become much harder. Mathematically this problem happens to be a classic problem in combinatorial optimisation: the coupon-collector problem. This problem is when there is a finite number of different coupons any of which can be given to a person one at a time. In a statistical sense this is the problem of sampling with replacement. The question is then how many times do you need to sample to obtain a copy of every single sticker?

# Why Lawyers Need Statistics

I previously talked about Bayes’ theorem and its often misunderstood applications. Normally these mistakes aren’t particular costly or harmful in the world of statistics, but if they are used to make decisions that impact on the real world then getting things wrong can be extremely costly.

One place where statistics can be called upon to influence important matters is the court. Throughout the last 50 years there has been an increase in the use of statistics in court matters and it is important that everybody involved understands them and their use. If any or all of the prosecution, defence or jury misinterpret the information given to them then the chances of a miscarriage of justice will greatly increase.

$\text{Prob of matching a description} \neq \text{Prob of matching a description and being guilty}$

The classical mistake made in the past by many prosecuting teams is that of the ‘prosecutors fallacy‘. This is when the prosecution or defence have presented the jury with some statistic such as a probability that has been calculated incorrectly, yet manage to convince the jury to accept its truth.

Scales of justice; source

# Expected Goals in Football

Previously I talked about one way to measure the success of a football team over a year through `Pythagorean Expectation‘. Whilst this is a pretty good metric for predicting success, it can only be applied over a certain number of games and can’t tell us anything about a particular match. Since being able to determine how well a team performed in a particular match is the ultimate goal of analysing games, many ideas have been developed to try and do this with increasing accuracy.

#### A Short List of (Increasingly Better) Football Metrics

1. Goals Scored & Goals Conceded
2. Shots
3. Shots on Target (SoT)
4. Total Shots Ratio (TSR)
5. Expected Goals (xG)
6. Expected Goals with tracking data

At the most basic level that you see in the overall league table is the goals scored and conceded for each team. Teams that tend to score lots of goals and concede few goals win more matches and hence finish higher up in the table at the end of league.

# What is Bayes’ Theorem?

One of the most famous theorems in statistics and probability is that of Bayes’ Theorem, which first appeared around 250 years ago. It allows us to calculate reverse probabilities and use new evidence to update our beliefs. For example the probability of a hypothesis given a set of evidence can be found from the probability of that evidence given a hypothesis.

To understand Bayes’ Theorem it is important to have a basic understanding of conditional probability. This is the probability of something happening given that some event has already happened. Some examples of conditional probabilities are given below,

• Given that Watford scored a goal, what was the probability that Odion Ighalo scored?
• Given that it rained yesterday, what is the probability that it will remain tomorrow?
• Given a sports centre has a swimming pool, what is the probability it also has a gym?
Video

# Changepoint Detection – Part 2

If you haven’t already, I would strongly recommend glancing over the part 1 of this post before continuing. In it we talked about a method for detecting a single changepoint through techniques such as maximum log-likelihood estimation. The question now is the more difficult one of how can we search for multiple changepoints?

The simplest thing to do would be to initially apply our Single Changepoint Detection (SCD) method to the series we want to analyse. If there is no changepoint found we are done, but if there is a changepoint found we end up with two distinct segments.

# Changepoint Detection – Part 1

Previously we discussed what a changepoint is, how they work and examples of where you might find them. Today however we shall go one step further and attempt to discuss solving the problem of how to find these changepoints.

Multiple changepoints have be found, they are the red lines.

Let’s begin by trying to figure out a way of detecting if just a single changepoint exists in a set of data. The simplest way to pose this problem is to describe it through a ‘Hypothesis Test’, where we have a null hypothesis ($H_0$) and an alternative hypothesis ($H_1$),

$H_0:$ no changepoint            $H_1:$ 1 changepoint

If you are unfamiliar with hypothesis tests, the idea is to run some kind of test that will give us a value. If that value is above a certain threshold, we can reject the null hypothesis and in our case accept the alternative hypothesis. If not, we can’t disprove the null hypothesis and we accept it as true. In layman’s terms, it’s the classic innocent until proven guilty. Alternatively check it out yourself!