# A Basic Guide to Optimisation

Often problems occur where we want to maximise or minimise some quantity. This is pretty easy to do for simple functions or graphs that we can plot. When the information is plotted in a graph we can visually look for the point in the graph which is highest or lowest. Alternatively if we have a function that can be written mathematically we can use differentiation to find its maximum or minimum.

What optimisation is able to cope with is finding the maximum or minimum of a quantity given that it is limited or constrained’ by some other facts. Think of trying to build as many products in a factory as you can, whilst only having a limited amount of raw materials and workers both of which need to be used to build the product.

### Maximisation or Minimisation?

Normally we want to deal with either a maximisation or minimisation problem. It turns out that these two problems are in fact two sides of the same coin: finding the minimum of some function $f(x)$ is the same as finding the maximum of $-f(x)$. As a result we typically define optimisation problems in the form of a minimisation problem. $\min f(x) = \max -f(x)$ Source: ltcconline

# Maintenance and Optimal Stopping

Imagine we have a machine in a factory that makes products which are sold to the general public. Ideally we want to make as many products as possible so we can sell them all to make a nice healthy profit. This means that we really want to be running the machine all through the day with no unnecessary stoppages.

Realistically however we know that machines do in fact breakdown and companies employ teams of people whose job it is to maintain the machines so that they can keep them running or repair them if they do break. Generally these maintenance teams will have a planned schedule of times when they will stop the machine in order to do their necessary checks and servicing. This approach works well except for the fact that there are often conflicting views between the maintenance department and the production department. The production department wants to make sure that products are always being made (as few stops as possible), whereas the maintenance viewpoint is that the aim should be to stop the machine breaking down (needing a lot of stops). This poses the question of how to find the optimum balance between having a reliable machine and a productive machine?

In this final part of my series on Bradley-Terry models I will talk about how the simple concepts behind Bradley-Terry models link with and underpin some more well-known and advanced concepts.

### 1. Logistic Regression

Let’s start by making a substitution in the formula of $\lambda_i = e^{b_i}$. $P(i \text{ beats } j) = \cfrac{\lambda_i}{\lambda_i + \lambda_j}$ $P(i \text{ beats } j) = \cfrac{e^{b_i}}{e^{b_i} + e^{b_j}}$

With a bit of mathematical manipulation we can recast this into a more familiar form. $P(i \text{ beats } j) = \cfrac{e^{b_i-b_j}}{e^{b_i-b_j} + 1} = \cfrac{1}{1 + e^{-(b_i-b_j)}}$

These look very similar to the form of a logistic regression. This is a regression where the dependent variable can only take two values – it is binary. In our case we only have the outcomes of team i winning or team i losing. $\text{ invlogit}(x) = \cfrac{e^{x}}{e^{x} + 1} = \cfrac{1}{1 + e^{-x}}$

Then if we substitute our initial expression into the logistic transformation we obtain the following terms which can be furthered simplified using the fact that $\lambda_i = e^{b_i}$. $\text{ invlogit}(P(i \text{ beats } j)) = \log{\cfrac{P(i \text{ beats } j)}{1 - P(i \text{ beats } j)}} = \log{\cfrac{\lambda_i}{\lambda_j}} = b_i - b_j$

From here it is simple to invert the transformation to get the final result, which is that the probability of team i beating team j is just a logistic regression on $b_i-b_j$. $P(i \text{ beats } j) = \text{ logit}(b_i - b_j)$

I previously talked about the use of the classic Bradley-Terry model and its applicability to a wide variety of situations from ranking in machine learning algorithms through to modelling sports teams. In this post I will briefly outline some of the main modifications to the model over the last 60 years, extending its use into a wider range of situations.

In many sports that are played in front of a partisan crowd there is often a benefit to the team being supported. This is the concept of “Home Advantage“, where the local team tends to perform better than the visiting team. This is not just because of crowd support, it might also include the fact that the home team are more experienced at playing in those conditions – think of the England’s cricketing struggles in the Indian subcontinient!

A mathematical form for this was suggested by Agresti (1990) which is given below, $P(i \text{ beats } j | i \text{ at home }) = \cfrac{\theta\lambda_i}{\theta\lambda_i + \lambda_j}$ $P(i \text{ beats } j | j \text{ at home }) = \cfrac{\lambda_i}{\lambda_i + \theta\lambda_j}$

Here the parameter $\theta > 1$ represents the size of the home-field advantage, the larger the value the more likely the home team wins. Suppose that there is a competition between two people with only two outcomes. Either one of the two players, Alice or Bob, can win and the other must lose. This is a zero-sum game: if Alice wins Bob must lose and if Bob wins Alice must lose. Now let’s further assume that the game that they are playing is skill based and the participants success is determined by their relative skill levels. This means that games that are primarily based on luck or chance such as Snakes and Ladders are excluded, but skill based games like chess are acceptable .

This idea is formally known as a Bradley-Terry model (1952), where the chance of Alice or Bob winning are in proportion to their skill levels. If Alice has a skill level of $\lambda_A$ and Bob has a skill level of $\lambda_B$, then the probability that either one wins is the ratio of their skill level to their combined total skill level. $P(Alice \text{ beats } Bob) = \cfrac{\lambda_A}{\lambda_A + \lambda_B}$ $P(Bob \text{ beats } Alice) = \cfrac{\lambda_B}{\lambda_A + \lambda_B}$

It is clear to see that the law of total probability holds, if we sum up the probabilities of both outcomes we get unity and the basic idea of winning in proportion to the “skill” or “quality” is also obeyed. A game of skill

# Preflow-push Algorithms

As we talked about previously the idea of preflow-push is to start with the maximum flow in the network that is allowed. This is known as flooding the network’. From there we try to send flow towards the sink.

An important concept in this case is the fact that during the algorithm we often violate mass-balance constraints. In other words we can have more flow going into a particular node than leaves it. The difference between the flow in and flow out is referred to as the excess’. In order to find a feasible solution we need all the nodes apart from the source or sink to have a total excess of zero. We remove this excess by pushing’ flow to adjacent nodes. Flooding the network

# Augmenting Path Algorithms

In this post I will briefly describe one type of algorithm that has been used to solve the well-known max flow problem. This problem is essentially how to find the maximum flow that can be sent from a source node to a sink node in a network. Think of a graph connected with nodes and arcs each having a certain capacity or limit of flow that it can hold. This could be a very literal thing such as network of pipes or internet cables, but other stranger uses for network flow problems do exist such as the baseball elimination problem I talked about before.

### Residual Capacities

The augmenting path algorithm utilises ideas in capacitated graphs called the residual capacity and the residual flow. This is the potential for flow to move along particular arcs. It can be summed up as Residual Flow = Capacity – Forwards Flow + Backwards Flow. See the diagram below to see how it works (the first number is the flow, the second the capacity). Residual capacities