Previously I talked about one way to measure the success of a football team over a year through `Pythagorean Expectation‘. Whilst this is a pretty good metric for predicting success, it can only be applied over a certain number of games and can’t tell us anything about a particular match. Since being able to determine how well a team performed in a particular match is the ultimate goal of analysing games, many ideas have been developed to try and do this with increasing accuracy.
A Short List of (Increasingly Better) Football Metrics
- Goals Scored & Goals Conceded
- Shots on Target (SoT)
- Total Shots Ratio (TSR)
- Expected Goals (xG)
- Expected Goals with tracking data
At the most basic level that you see in the overall league table is the goals scored and conceded for each team. Teams that tend to score lots of goals and concede few goals win more matches and hence finish higher up in the table at the end of league.
The use of shots as a predictor of the better team in a match follows simple logic: if you shot more you score more. You might think that this is true and to a certain extent it is, however the link is very weak. As shown by Chris Anderson and David Sally in The Numbers Game, there is very little correlation between goals and shots. On the other hand there is a much stronger connection between shots on target and goals scored. This is why all the statistics on TV mention shots on target. In other words it’s not enough just to shoot, you have to test the keeper.
Of course the issue with shots on target is that it still doesn’t really tell you . This is where the total shots ratio metric (TSR) comes in. It is the number of shots on target for a team out of all the shots on target in a match. So if Watford have 5 shots on target and Luton have 2 shots on target, Watford’s TSR is 0.71 (5/7) whilst Luton’s is a measly 0.29 (2/7). It turns out that TSR is again a slightly better predictor of teams success, particularly over a season.
Finally there is a relatively new concept called ‘Expected Goals‘ which is grounded more in the football match itself. The idea behind it is that every action in a football match can be assigned some probability of scoring a goal. This means that some actions are more likely to result in goals than others, overcoming the limitations of simple shots. For example a penalty has more chance of leading to a goal than a shot from the half-way line. Initially most work was done by rating the quality of shots to generate expected goals, but you can also convert other events such as passes or dribbles into expected goals as well.
The key idea is that it is not necessarily the quantity of shots that matters in a match, but the quality of the shots.
Expected Goals via Shot Locations
One basic approach as suggested by Michael Caley is to group various shot locations together into various zones. Then we can estimate the probability of scoring a shot from each of these zones by using a large training data set, thus allowing us to assign an expected goal value for each shot taken.
In the above diagram basic footballing intuition has been used to decide where to place these zones. There is a clear separation between shots inside the box (Zones 1-5) and shots outside the box (Zones 6-8). It is also natural to make use of the symmetry of the pitch when deciding these locations. We would expect to find that the most valuable shots are probably in Zone 1, the centre of the 6 yard box where it hard to miss from. Conversely a tighter angle to the goal should make the shot harder to score and hence less valuable.
Again this table is lifted directly from Caley who made use of his access to an Opta data set. He calculated the probability of scoring a goal given a shot on target from each of these zones (G/SoT), over 4 Premier League seasons.
Grad College 1st Team Interlude
I will now make the undeniably dubious claim that these percentages can be applied to Lancaster University intra-league football, purely so I can attempt to highlight how an expected goals model might work.
Looking at a recent Graduate College 1st team match against Cartmel, I found that the following shots were taken by each team in each of the zones described earlier. Using just the shots on target and the percentages above, we can estimate the expected goals for either team.
In the first half as you can tell from the data Graduate college were the dominant team. They not only had five times as many shots, but they also had a more telling 10 shots on target to Cartmel’s 3. These shots were also in pretty dangerous areas with the model predicting that the score in expected goals was about 3-1. It seems that the better team was winning as the half time score was in fact 4-1.
The second half was an altogether more cagey affair with Graduate college sitting back more and being happy to be on the defensive. This resulted in a more even second half seen, shots and shots on target were about even for both teams. Interestingly the model suggests that Cartmel had the more dangerous chances! The second half ended 0-0.
The final score ended up as 4-1, which was also the half time score. By comparing this result to the expected goals model, we can see that Grad did deserve to win the match but Cartmel were unlucky not score a 2nd goal!
Another way of doing this shot location modelling is to look at the probability of scoring a shot given its radial distance from the goal, which Martin Eastwood has shown to roughly follow an exponential decay pattern. James Grayson has also done lots of work in this field and has some nice plots illustrating the benefit of more advanced metrics in predicting a teams future performance here. Duncan Alexander also uses expected goals to explain the rise in Leicester City’s fortunes here.
Including Tracking Data
Tracking data means that people can now attempt to quantify the value of a particular shot in a match more accurately. This means that you can say that a shot from 30 yards out is worth say 0.3 of a goal, whilst a shot inside the penalty area is worth 0.7 of a goal. Indeed you can even start to construct player specific models. If you can work out Odion Ighalo’s expected goals when he’s outside the box, you can calculate the difference in expected goals if Wayne Rooney was playing instead of him. This means that the impact of individual players can finally be quantified in scoring positions.
The other use of tracking data is that it can take into account factors not picked up by the expected goals model. For example, the actions of defenders. Expected goals only uses properties of the actual shot such as the distance or angle to the goal. Anyone who has played the game knows that the presence of defenders makes shots much more difficult, so this is an active area of research.
Tracking data can go even further than this to analyse entire passages of play, as seen in the youtube video below. Whilst the raw data is hard to find (it costs money!) many websites such as Squawka display it in a nice visual way.