# Baseball Elimination Problem

Recently we had a masterclass from Pitu Mirchandani on the subject of network flows. These network flow problems have been developed since the 1950’s an example of which is my previous post on Dijkstra’s algorithm. The main thing I took away from the masterclass was that the idea of shortest paths, maximum flows and min cost max flows can be used to describe and solve a surprisingly wide variety of problems.

One interesting problem which can be solved using the ideas of network flows is that of the baseball-elimination problem. Baseball in the USA consists of two different leagues which are each split up into three divisions of about 5 teams. During regular season each team plays 162 games in total, most of which are against teams in their division (76 vs same division, 66 vs same league, 20 vs different league). The aim of this is try and qualify for the playoffs, which can be done by finishing first in the division.

All this adds up to a desire to be able to work out whether it is possible to see if a particular team can still potentially win the division and make it to the playoffs given the current league standings. Solving this problem means that we have to look at tables that might look something like the one below.

Who can make the playoffs?

In this simple example it is clear to see that Houston are already eliminated, the maximum number of wins they can get is 77+4=81 which is less than the 82 wins that LA already have. However the cases of the remaining three teams are all intertwined. Take Texas for example, if they win there three games to play the can reach 83 wins and overtake LA. However to win the league they would LA to lose 8 out of their 9 remaining games. In this scenario 1 loss vs Texas, 2 vs Houston still leaves 5 losses needed out of 6 games against Oakland. However if this happens Oakland will themselves reach 83 wins and claim first place! Clearly it is not enough to look at a team’s wins and games left to play. We need to consider who those games are played against.

### Some Notation

Let’s assume that team i has already won $w_i$ games that it has played with $r_i$ games to play and that $r_{ij}$ is the number of games between teams i and j that have yet to be played. We shall also set $w_{max} = w_i + r_i$, so that $w_{max}$ is the maximum number of victories that team $i$ can obtain.

Using this notation we cannot eliminate team $i$ if in some outcome of the remaining games to be played throughout the league, $w_max$ is at least as large as the possible victories of every other team.

• The basic idea is to assume that the team wins all their remaining games ⇒ $w_3 + r_3$ wins
• Use network flows to try to share out the remaining games so all the other teams have $\le w_3 + r_3$ wins

### Network Formulation

The first step in building the network is to create nodes for each of the remaining game combinations called game nodes in the example below. There is an arc connecting the source to each game node separately. The capacities of each of these arcs are the number of games left between the two teams represented by the game node that the arc is connected to. This is shown in the graph as $r_{24}$, the number of games to still be played between teams 2 and team 4.

We also require a set of nodes in the network that represent the teams known as the team nodes. These team nodes are fed into by arcs from each game node that involves that particular team. The capacities of these arcs are usually modelled as infinite, since the number of additional games won is only restricted by the number of games left to play.

Example network to see if Team 3 can be eliminated; source

Finally there is a sink node that completes the graph by being fed into by arcs from all of the team nodes. The capacities of these arcs are the difference between the potential total number of games won by team 3 and the number of games won by team i. In other words these are the number of games that teams can still win whilst remaining with less wins overall than team 3.

The flow in this model between the source node s and the team node i is the total number of additional games that team i wins. In the problem we can eliminate a team if the network contains a feasible flow x satisfying the conditions for all other teams i,

$w_{max} \ge w_i + x_{si}$

We can rewrite this equation in our case to see it matches the capacities in the diagram,

$x_{si} \le w_{max} -w_i = w_3 + r_3 - w_i$

The way to find out whether team 3 is eliminated or not is to find out if all the games can flow through the network. This is a maximum flow problem: if the max flow saturates (equals the capacities of) all the arcs leaving the source the team is not eliminated, but if there is no feasible flow then team 3 is sadly eliminated.

### Useful Asides

A useful link if you’re interested in extensions and other facts about this problem is from Kevin Wayne here and some lecture notes from Cornell here. I also got my first book out of the library in several years, Network Flows: Theory, Algorithms and Applications by Ahuja, Magnati and Orlin. This has a section on the baseball-elimination problem and is more generally known as one of the best textbooks on the subject of network flows.