Aside

# Markov Chains and Tennis

Now that the tennis season is well underway and the Australian Open is already a few weeks in the past, I thought I would look at a basic model of a game of tennis for today’s blog post. As you might expect there will be some maths later on, but feel free to skip that and just look at the pictures and the results!

The simplest way to model a tennis match as suggested by people such as O’Malley, is to look at the probability of players winning a point on their own serve. Then we can define the probability of a player winning a point on their serve as p, and the probability of losing a point on their serve as 1-p. These two probabilities can be used to build up a model of the entire match. Whilst this is clearly a simplification it has been shown to be a pretty good predictor of the overall winner of a match.

For example by checking the ATP website we can see that Roger Federer won 3737 out of 5202pts he played on serve last year, giving him an average probability of p=0.718 to win a point on his serve. Giles Simon only managed to win 3186 out of 5128pts on his serve and consequently had a more typical value of probability p=0.621 to win a game on his serve. Most top-50 ATP players had values in the range of 0.6 to 0.7 for the 2015 season, which is obviously pretty good!

I’m assuming that most people reading this will be familiar with the game of tennis, but you might not have considered viewing it in a mathematical way. A single game in a set is just of sequence of points and depending on who wins the points the score changes. Because of this we can view the game as a set of states with various probabilities for entering and leaving those states. In mathematical terms this is what’s known as a Markov Chain. This idea is best explained in the diagram below, where Player A is serving.

Markov Chain for a game of Tennis; adapted from source at wolfram.com

To save myself work I have merged the 40-30, 30-30 & 30-40 states together with the Adv A, Deuce & Adv B states respectively. This can be done as they are essentially the same, you need to win 2 points in a row from deuce or 30-30 to win the game. The directed arrows show the possible paths through the various states (scorelines) and the probability of going in that direction (e.g. whoever wins the point).

Clearly we can now start to work out some mathematical formulas. If we are only interested in the probability of Player A winning his service game, then there are 4 scores that he can win from, 40-0, 40-15, 40-30 and Adv-40.

Holding serve to love (0) requires that Player A wins the first four points in a row, so the probability of winning the game in this case is,

$\text{Prob(Hold serve to 0)} = p^4$

Holding serve to 15 means that Player A must win 3 out of the first 4 points and the 5th point as well. This means that the answer is linked in with the binomial theorem and the binomial coefficients. Since there are 4 different ways that Player A can hold serve to 15 (i.e. Player B can win either the 1st, 2nd, 3rd or 4th point), we need to multiply the expression by four.

$\text{Prob(Hold serve to 15)} = {4 \choose 1} p^4(1-p) = 4p^4(1-p)$

In a similar vein we can work out that to hold serve to 30, Player A must win 3 out of the first 5 points and the 6th point. As a result the probability in this case is,

$\text{Prob(Hold serve to 30)} = {5 \choose 2} p^4(1-p)^2 = 10p^4(1-p)^2$

The final case to consider is what happens once we reach deuce. As is illustrated in the diagram this is where it starts to get trickier, as we can potentially enter an infinite loop where the game never ends. To reach deuce for the first time Player A and Player B both have to win 3 points each out of the first 6 played. So the probability to reach the first deuce is,

$\text{Prob(Reach deuce)} = {6 \choose 3} p^3(1-p)^3 = 20p^3(1-p)^3$

Now Player A needs to win 2 points in a row to win the game from the first deuce situation and any future deuce situations that may arise. Future deuce’s arise when both players have won 1 point each, either Player A wins a point and Player B the following point or vice versa. So we can write the probability of winning from a deuce as,

$\text{Prob(Hold serve via deuces)} = 20p^5(1-p)^3\left(1 + 2p(1-p) + (2p(1-p))^2 + \ldots\right)$

This is infact an infinite sequence of terms that we would like to sum together. This can be done as they are in fact a convergent geometric sequence,since $2p(1-p) <1$, given that $0. As a result we can simplify the above expression to,

$\text{Prob(Hold serve via deuces)} = \cfrac{20p^5(1-p)^3}{1-2p(1-p)}$

So the final equation for the probability of Player A winning the game is,

$P = p^4 + 4p^4(1-p) + 10p^4(1-p)^2 + \cfrac{20p^5(1-p)^3}{1-2p(1-p)}$

Let’s finish by calculating the probabilities of Roger Federer and Giles Simon winning their own service games in this model. Substituting p=0.718 gives us that Roger Federer wins his serve 92.1% of time, whilst Giles Simon’s p=0.621 turns into a 77.8% chance of winning his service game.

Finally more complex models might take into account the affect that the current scoreline has on the probability of a player winning a point. If a player is under pressure late on in a set, he might have a lower probability of winning that point than if it is the first point of the match. Other factors such as fatigue might come into play, but this model can’t cope with them as it assumes independence between all points.

Should you be interested in other posts in this area, I would suggest ones such as this post on the Isner-Mahut marathon from Wimbledon 2010 and this post on extending the model to the probability of winning an entire match.