As we talked about previously the idea of preflow-push is to start with the maximum flow in the network that is allowed. This is known as `flooding the network’. From there we try to send flow towards the sink.
An important concept in this case is the fact that during the algorithm we often violate mass-balance constraints. In other words we can have more flow going into a particular node than leaves it. The difference between the flow in and flow out is referred to as the `excess’. In order to find a feasible solution we need all the nodes apart from the source or sink to have a total excess of zero. We remove this excess by `pushing’ flow to adjacent nodes.
Flooding the network
In this post I will briefly describe one type of algorithm that has been used to solve the well-known max flow problem. This problem is essentially how to find the maximum flow that can be sent from a source node to a sink node in a network. Think of a graph connected with nodes and arcs each having a certain capacity or limit of flow that it can hold. This could be a very literal thing such as network of pipes or internet cables, but other stranger uses for network flow problems do exist such as the baseball elimination problem I talked about before.
The augmenting path algorithm utilises ideas in capacitated graphs called the residual capacity and the residual flow. This is the potential for flow to move along particular arcs. It can be summed up as Residual Flow = Capacity – Forwards Flow + Backwards Flow. See the diagram below to see how it works (the first number is the flow, the second the capacity).
Recently we had a masterclass from Pitu Mirchandani on the subject of network flows. These network flow problems have been developed since the 1950’s an example of which is my previous post on Dijkstra’s algorithm. The main thing I took away from the masterclass was that the idea of shortest paths, maximum flows and min cost max flows can be used to describe and solve a surprisingly wide variety of problems.
One interesting problem which can be solved using the ideas of network flows is that of the baseball-elimination problem. Baseball in the USA consists of two different leagues which are each split up into three divisions of about 5 teams. During regular season each team plays 162 games in total, most of which are against teams in their division (76 vs same division, 66 vs same league, 20 vs different league). The aim of this is try and qualify for the playoffs, which can be done by finishing first in the division.
All this adds up to a desire to be able to work out whether it is possible to see if a particular team can still potentially win the division and make it to the playoffs given the current league standings. Solving this problem means that we have to look at tables that might look something like the one below.
Who can make the playoffs?
With the Premier League season drawing to a close I thought I’d go back and see what the ‘Pythagorean Expectation’ for the current league table currently looks like. Using the same values as in my previous post of means we can see if the same teams seem to under or over performing.
This is the league table from earlier in the year.
Old League Table
Below is the current league table I’m working from with about 5 or 6 games to go.
Current League Table
- Leicester seem to be unstoppable in real life and despite my prediction of them hitting a blip nothing of the sort has happened to them yet. This is reflected in them having out performed the average teams points total for their goals scored and conceded by a whopping 9.29 pts. This has almost doubled in the last 10 games (they have only lost 1 league game in the last 10 and that was by a single goal)!
- The other prediction I made was that Tottenham seemed to be under-achieving earlier in the season and that they could challenge for the title. Despite not substantially improving their residual -5.64 to -4.81, they have maintained their good run of form and look like the most likely title challengers to Leicester.
- Other significant improvements have been made by the likes of Bournemouth and Stoke who have turned their form around to pick up more points than they might have expected to.
- Everton continue to be having a shocking season based on pythagorean expectation which matches the conventional wisdom about them. In a more typical season based on their goals scored and conceded they should have almost 9 more points.
- Finally despite my current worries about Watford the maths seems to suggest that they will still be fine in the league this year.
There are two main strands of statistics, Classical statistics and Bayesian statistics. These aren’t necessarily conflicting ideologies, though many statisticians throughout history would beg to differ, but are simply two different ways to tackle a problem. Hopefully this post will give you some brief insight into the uses and differences of the two approaches.
Classical statistics is the first type of statistics that people come across and is to do with what we expect to happen in a repeatable experiment. This might be the idea that if we flip a coin an infinite number of times the proportion of heads we obtain is a half. Hence we get the well known probability of a head as a 1/2. This is why classical statistics is often known as frequentist and covers ideas such as confidence intervals and p-values.
Bayesian statistics evolved out of Bayes’ Theorem which I talked about in a previous post.
- P(A), P(B) are known as prior probabilities, because we know them before we learn any more information.
- P(A|C), P(B|C) are known as posterior probabilities, because they are found after we have learnt some additional information.
You can think of this Bayesian statistics as an evolution of Bayes’ Theorem. Instead of dealing with point probabilities we now deal with probability distributions. As a result we now have prior and posterior distributions to consider.
As the term is just a normalising constant we can drop it to get the commonly seen Bayes’ Rule.
Here is the posterior distribution, is the likelihood which accounts for the statistical model and is the prior which represents the expert beliefs before seeing the data. The key point is that the Bayesian approach can quantify theories and hypotheses, something that can be desirable.
Although Euro 2016 is still months away from officially starting, another footballing tradition has just started: collecting football stickers. For children and adults alike the quest to complete the entire sticker album is something that takes time and money but the excitement and joy makes it worth it.
As the number of stickers in each album have dramatically increased from the first few albums produced around 40 years ago, the task of collecting all of the stickers has become much harder. Mathematically this problem happens to be a classic problem in combinatorial optimisation: the coupon-collector problem. This problem is when there is a finite number of different coupons any of which can be given to a person one at a time. In a statistical sense this is the problem of sampling with replacement. The question is then how many times do you need to sample to obtain a copy of every single sticker?