In this final part of my series on Bradley-Terry models I will talk about how the simple concepts behind Bradley-Terry models link with and underpin some more well-known and advanced concepts.

### 1. Logistic Regression

Let’s start by making a substitution in the formula of $\lambda_i = e^{b_i}$.

$P(i \text{ beats } j) = \cfrac{\lambda_i}{\lambda_i + \lambda_j}$

$P(i \text{ beats } j) = \cfrac{e^{b_i}}{e^{b_i} + e^{b_j}}$

With a bit of mathematical manipulation we can recast this into a more familiar form.

$P(i \text{ beats } j) = \cfrac{e^{b_i-b_j}}{e^{b_i-b_j} + 1} = \cfrac{1}{1 + e^{-(b_i-b_j)}}$

These look very similar to the form of a logistic regression. This is a regression where the dependent variable can only take two values – it is binary. In our case we only have the outcomes of team i winning or team i losing.

$\text{ invlogit}(x) = \cfrac{e^{x}}{e^{x} + 1} = \cfrac{1}{1 + e^{-x}}$

Then if we substitute our initial expression into the logistic transformation we obtain the following terms which can be furthered simplified using the fact that $\lambda_i = e^{b_i}$.

$\text{ invlogit}(P(i \text{ beats } j)) = \log{\cfrac{P(i \text{ beats } j)}{1 - P(i \text{ beats } j)}} = \log{\cfrac{\lambda_i}{\lambda_j}} = b_i - b_j$

From here it is simple to invert the transformation to get the final result, which is that the probability of team i beating team j is just a logistic regression on $b_i-b_j$.

$P(i \text{ beats } j) = \text{ logit}(b_i - b_j)$

### 2. Pythagorean Expectation

Let’s consider the case of making an alternative substitution of $\lambda_i = e^{\gamma\ln{c_i}} = c_i^\gamma$. This roughly corresponds in a physical sense to the idea that the skill of a player is best represented by some other underlying measure c.

$P(i \text{ beats } j) = \cfrac{\lambda_i}{\lambda_i + \lambda_j}$

$P(i \text{ beats } j) = \cfrac{c_i^\gamma}{c_i^\gamma + c_j^\gamma}$

At this point it is clear that we have come across this formula before. If we let $c_i$ be runs scored by team i and $c_j$ be runs scored by team j (which is the same as runs conceded by team i), we have obtained the classic Pythagorean win probability for a single game. It also fits into the logistic regression model if we change $\text{ logit}(b_i-b_j)$ to $\text{ logit}\gamma(b_i-b_j)$!

Alternatively if we took a slightly more general substitution of $\lambda_i = e^{\gamma_i\ln{c_i}} = c_i^{\gamma_i}$ we end up with the two exponent form of Pythagorean expectation. We previously saw this back in the form of the Pythagorean exponent I used for Premier League football.

$P(i \text{ beats } j) = \cfrac{\lambda_i}{\lambda_i + \lambda_j}$

$P(i \text{ beats } j) = \cfrac{c_i^{\gamma_i}}{c_i^{\gamma_i} + c_j^{\gamma_j}}$

Clearly there is more to the idea of Pythagorean wins than just a magic formula that seems to work! In fact it can be derived not just from a basic Bradley-Terry model but from another idea: that the scoring of runs or points is described by a Weibull distribution.

### 3. ELO Ratings

ELO ratings are now a well-known method of comparing players and teams performance with their competitors over a period of time. The principle is that each person has their own rating which is affected by the games played by that person against other players. The basic idea is that the bigger the difference between two ranked players, the more likely the higher ranked player is to win.

The appeal of this type of system is that it is in some sense self-correcting: if the underdog team wins it gains more points from its win than if the pre-match favored team were to win. This means that even if the model wrongly predicts the winner it will adjust the ratings to favour the actual winner. As a result the model should take into account the fact that skill changes over time – it is dynamic.

The probability of a team winning depends on the difference in ratings of the two teams playing.

$P(i \text{ beats } j) = P_i = \cfrac{1}{1 + 10^{-(R_i - R_j)/400}}$

Once more this is nothing more than a slightly readjusted form of a Bradley-Terry model. In this case we have used a slightly more abstract substitution of $p_i = 10^{(R_i)/400}$, where $R_i$ is the ELO rating of team i.

Once the match is played the model updates the ratings of the two teams that played. This is done by a simple linear formula which awards more points to a winning team if it was less favoured to win. In other words it discounts the ratings gain when the favourites win and increases the ratings gain when the underdogs win. We do this for each team i involved,

$R_{new} = R_{old} + K*(P-P_i)$

Here the term P is just the result of the match in terms on a scale of 0-1. So if it was a binary game of chess a win would give P=1 and a loss P=0. The difference between the predicted win percentage for team i $P_i$ and the outcome is then scaled by the constant term K to produce the update to the current rating of the team i.

It is this constant K that is the chief parameter of ELO models. A value that is too high will end up causing massive swings in ratings from only a few games, whereas a low value will not capture a teams change in performance quickly enough. The way that the model is designed means that it is not necessary to have a single value of K for all participants or matches. Often new players might have a slightly higher K factor to help us learn their true rating more quickly and more important matches such as World Cup finals will carry a greater weighting.

ELO rating – expected win%

In fact ELO ratings and their successor ratings such as GLICKO have been applied to many other sports in addition to chess. By starting all teams with a particular rating you can then calculate updated ratings and odds of winning matches between not just two current teams but any other historical teams as well. This provides a framework to start examining questions like who is the best team of all time and which teams beat their expectations the most.