# A Bradley-Terry Model

Suppose that there is a competition between two people with only two outcomes. Either one of the two players, Alice or Bob, can win and the other must lose. This is a zero-sum game: if Alice wins Bob must lose and if Bob wins Alice must lose. Now let’s further assume that the game that they are playing is skill based and the participants success is determined by their relative skill levels. This means that games that are primarily based on luck or chance such as Snakes and Ladders are excluded, but skill based games like chess are acceptable .

This idea is formally known as a Bradley-Terry model (1952), where the chance of Alice or Bob winning are in proportion to their skill levels. If Alice has a skill level of $\lambda_A$ and Bob has a skill level of $\lambda_B$, then the probability that either one wins is the ratio of their skill level to their combined total skill level. $P(Alice \text{ beats } Bob) = \cfrac{\lambda_A}{\lambda_A + \lambda_B}$ $P(Bob \text{ beats } Alice) = \cfrac{\lambda_B}{\lambda_A + \lambda_B}$

It is clear to see that the law of total probability holds, if we sum up the probabilities of both outcomes we get unity and the basic idea of winning in proportion to the “skill” or “quality” is also obeyed. A game of skill

We can then calculate the odds of each person winning by looking at the ratio of their probabilities to win, $\text{odds}(Alice \text{ beats } Bob) = \cfrac{P(Alice \text{ beats } Bob)}{P(Bob \text{ beats } Alice)} = \cfrac{\lambda_A/(\lambda_A + \lambda_B)}{\lambda_B/(\lambda_A + \lambda_B)} = \cfrac{\lambda_A}{\lambda_B}$

### Ranking Players

In many situations it is hard to come up with some ranking of all the current players directly (think of a playing a computer game online), but through the Bradley-Terry model it is possible to compare pairs of players through their individual match-ups to generate full rankings.

We can calculate these skill ratings and hence rank teams according to their skill by using the technique of Maximum Likelihood Estimation. This turns out to be simple to do mathematically and yields an answer that makes sense. Let’s assume that we have n players and a skill parameter for each of them, $\lambda = (\lambda_1,\ldots,\lambda_n)$. The only data we need is the number of matches that player i beat player j in, $w_{ij}$. Then we can write the likelihood as, $L(\lambda) = \prod_{i \neq j} f(w_{ij}|\lambda_i,\lambda_j)$

where the sum is over all the i,j=1,…,n except for i=j. For example with just Alice and Bob this would become, $L(\lambda) = f(w_{AB}|\lambda_A,\lambda_B)*f(w_{BA}|\lambda_A,\lambda_B)$ $L(\lambda) = (\frac{\lambda_A}{\lambda_A+\lambda_B})^{w_{AB}}*(\frac{\lambda_B}{\lambda_A+\lambda_B})^{w_{BA}}$

The general log-likelihood takes the form, $\ell(\lambda) = \sum_{i \neq j} (w_{ij}\ln{\lambda_i} - w_{ij}\ln{(\lambda_i+\lambda_j)})$

At this point we should also point out that the values of $\lambda$ that maximises this function arre not unique. We can see this easily from the log-likelihood above as $\ell(\lambda)=\ell(a\lambda)$, where a>0. In order to get a unique maximum likelihood we need a unique maximum set of p_i’s, so we set $\sum_i p_i =1$. This will be necessary as algorithms to find the MLE such as the Expectation-Maximisation (EM) algorithm only converge to a local maxima: if there is only one maxima it is guaranteed to be global.

We can then denote the number of matches that i wins against all other players as $W_i = \sum_{j \neq i} w_{ij}$. Also we see that the number of comparisons between i and j is $N_{ij}=w_{ij}+w_{ji}$, so we can further simplify the log-likelihood down to, $\ell(\lambda) = \sum_{i}W_{i}\ln{\lambda_i} -\sum_{i \neq j} N_{ij}\ln{(\lambda_i+\lambda_j)}$

Then we can take the derivative with respect to $\lambda_i$ to yield, $\ell'(\lambda) = \sum_{i}\frac{W_{i}}{\lambda_i} - \sum_{i \neq j} \frac{N_{ij}}{(\lambda_i+\lambda_j)}$

Finally we can then rearrange the equation $\ell'(\lambda)=0$ for $\lambda_i$ to get the MLE, $\hat\lambda_i = W_i (\sum_{i \neq j} \frac{N_{ij}}{\lambda_i + \lambda_j})^{-1}$

These $\hat\lambda_i$ form the maximum likelihood estimates for our skill parameters and by the properties of MLE are guaranteed to eventually converge as we add more and more data. From then on it is a trivial task to create rankings, predict future win probabilities or even calculate simple odds.

### Alice, Bob and Charlie play a game

I will now do a simple example based on three people playing in a triangular tournament of some skill based game. During the tournament we kept track of the results of all the matches played and then calculated our skill parameters once we had finished playing. It should be said that the more data (previous match-ups) that you have the more accurate the resulting skill parameters should be, so there is still time for Charlie to turn it around if we play again!