In this final part of my series on Bradley-Terry models I will talk about how the simple concepts behind Bradley-Terry models link with and underpin some more well-known and advanced concepts.

### 1. Logistic Regression

Let’s start by making a substitution in the formula of $\lambda_i = e^{b_i}$.

$P(i \text{ beats } j) = \cfrac{\lambda_i}{\lambda_i + \lambda_j}$

$P(i \text{ beats } j) = \cfrac{e^{b_i}}{e^{b_i} + e^{b_j}}$

With a bit of mathematical manipulation we can recast this into a more familiar form.

$P(i \text{ beats } j) = \cfrac{e^{b_i-b_j}}{e^{b_i-b_j} + 1} = \cfrac{1}{1 + e^{-(b_i-b_j)}}$

These look very similar to the form of a logistic regression. This is a regression where the dependent variable can only take two values – it is binary. In our case we only have the outcomes of team i winning or team i losing.

$\text{ invlogit}(x) = \cfrac{e^{x}}{e^{x} + 1} = \cfrac{1}{1 + e^{-x}}$

Then if we substitute our initial expression into the logistic transformation we obtain the following terms which can be furthered simplified using the fact that $\lambda_i = e^{b_i}$.

$\text{ invlogit}(P(i \text{ beats } j)) = \log{\cfrac{P(i \text{ beats } j)}{1 - P(i \text{ beats } j)}} = \log{\cfrac{\lambda_i}{\lambda_j}} = b_i - b_j$

From here it is simple to invert the transformation to get the final result, which is that the probability of team i beating team j is just a logistic regression on $b_i-b_j$.

$P(i \text{ beats } j) = \text{ logit}(b_i - b_j)$

I previously talked about the use of the classic Bradley-Terry model and its applicability to a wide variety of situations from ranking in machine learning algorithms through to modelling sports teams. In this post I will briefly outline some of the main modifications to the model over the last 60 years, extending its use into a wider range of situations.

In many sports that are played in front of a partisan crowd there is often a benefit to the team being supported. This is the concept of “Home Advantage“, where the local team tends to perform better than the visiting team. This is not just because of crowd support, it might also include the fact that the home team are more experienced at playing in those conditions – think of the England’s cricketing struggles in the Indian subcontinient!

A mathematical form for this was suggested by Agresti (1990) which is given below,

$P(i \text{ beats } j | i \text{ at home }) = \cfrac{\theta\lambda_i}{\theta\lambda_i + \lambda_j}$

$P(i \text{ beats } j | j \text{ at home }) = \cfrac{\lambda_i}{\lambda_i + \theta\lambda_j}$

Here the parameter $\theta > 1$ represents the size of the home-field advantage, the larger the value the more likely the home team wins.

Suppose that there is a competition between two people with only two outcomes. Either one of the two players, Alice or Bob, can win and the other must lose. This is a zero-sum game: if Alice wins Bob must lose and if Bob wins Alice must lose. Now let’s further assume that the game that they are playing is skill based and the participants success is determined by their relative skill levels. This means that games that are primarily based on luck or chance such as Snakes and Ladders are excluded, but skill based games like chess are acceptable .

This idea is formally known as a Bradley-Terry model (1952), where the chance of Alice or Bob winning are in proportion to their skill levels. If Alice has a skill level of $\lambda_A$ and Bob has a skill level of $\lambda_B$, then the probability that either one wins is the ratio of their skill level to their combined total skill level.

$P(Alice \text{ beats } Bob) = \cfrac{\lambda_A}{\lambda_A + \lambda_B}$

$P(Bob \text{ beats } Alice) = \cfrac{\lambda_B}{\lambda_A + \lambda_B}$

It is clear to see that the law of total probability holds, if we sum up the probabilities of both outcomes we get unity and the basic idea of winning in proportion to the “skill” or “quality” is also obeyed.

A game of skill

# Baseball Elimination Problem

Recently we had a masterclass from Pitu Mirchandani on the subject of network flows. These network flow problems have been developed since the 1950’s an example of which is my previous post on Dijkstra’s algorithm. The main thing I took away from the masterclass was that the idea of shortest paths, maximum flows and min cost max flows can be used to describe and solve a surprisingly wide variety of problems.

One interesting problem which can be solved using the ideas of network flows is that of the baseball-elimination problem. Baseball in the USA consists of two different leagues which are each split up into three divisions of about 5 teams. During regular season each team plays 162 games in total, most of which are against teams in their division (76 vs same division, 66 vs same league, 20 vs different league). The aim of this is try and qualify for the playoffs, which can be done by finishing first in the division.

All this adds up to a desire to be able to work out whether it is possible to see if a particular team can still potentially win the division and make it to the playoffs given the current league standings. Solving this problem means that we have to look at tables that might look something like the one below.

Who can make the playoffs?

# Updated Pythagorean Tables

With the Premier League season drawing to a close I thought I’d go back and see what the ‘Pythagorean Expectation’ for the current league table currently looks like. Using the same values as in my previous post of $\gamma_1 =1.18 \text{ and } \gamma_2 = 1.23$ means we can see if the same teams seem to under or over performing.

This is the league table from earlier in the year.

Old League Table

Below is the current league table I’m working from with about 5 or 6 games to go.

Current League Table

Observations

• Leicester seem to be unstoppable in real life and despite my prediction of them hitting a blip nothing of the sort has happened to them yet. This is reflected in them having out performed the average teams points total for their goals scored and conceded by a whopping 9.29 pts. This has almost doubled in the last 10 games (they have only lost 1 league game in the last 10 and that was by a single goal)!
• The other prediction I made was that Tottenham seemed to be under-achieving earlier in the season and that they could challenge for the title. Despite not substantially improving their residual -5.64 to -4.81, they have maintained their good run of form and look like the most likely title challengers to Leicester.
• Other significant improvements have been made by the likes of Bournemouth and Stoke who have turned their form around to pick up more points than they might have expected to.
• Everton continue to be having a shocking season based on pythagorean expectation which matches the conventional wisdom about them. In a more typical season based on their goals scored and conceded they should have almost 9 more points.
• Finally despite my current worries about Watford the maths seems to suggest that they will still be fine in the league this year.

# Expected Goals in Football

Previously I talked about one way to measure the success of a football team over a year through `Pythagorean Expectation‘. Whilst this is a pretty good metric for predicting success, it can only be applied over a certain number of games and can’t tell us anything about a particular match. Since being able to determine how well a team performed in a particular match is the ultimate goal of analysing games, many ideas have been developed to try and do this with increasing accuracy.

#### A Short List of (Increasingly Better) Football Metrics

1. Goals Scored & Goals Conceded
2. Shots
3. Shots on Target (SoT)
4. Total Shots Ratio (TSR)
5. Expected Goals (xG)
6. Expected Goals with tracking data

At the most basic level that you see in the overall league table is the goals scored and conceded for each team. Teams that tend to score lots of goals and concede few goals win more matches and hence finish higher up in the table at the end of league.

Aside

# Markov Chains and Tennis

Now that the tennis season is well underway and the Australian Open is already a few weeks in the past, I thought I would look at a basic model of a game of tennis for today’s blog post. As you might expect there will be some maths later on, but feel free to skip that and just look at the pictures and the results!

The simplest way to model a tennis match as suggested by people such as O’Malley, is to look at the probability of players winning a point on their own serve. Then we can define the probability of a player winning a point on their serve as p, and the probability of losing a point on their serve as 1-p. These two probabilities can be used to build up a model of the entire match. Whilst this is clearly a simplification it has been shown to be a pretty good predictor of the overall winner of a match.

For example by checking the ATP website we can see that Roger Federer won 3737 out of 5202pts he played on serve last year, giving him an average probability of p=0.718 to win a point on his serve. Giles Simon only managed to win 3186 out of 5128pts on his serve and consequently had a more typical value of probability p=0.621 to win a game on his serve. Most top-50 ATP players had values in the range of 0.6 to 0.7 for the 2015 season, which is obviously pretty good!