What optimisation is able to cope with is finding the maximum or minimum of a quantity given that it is limited or `constrained’ by some other facts. Think of trying to build as many products in a factory as you can, whilst only having a limited amount of raw materials and workers both of which need to be used to build the product.

Normally we want to deal with either a maximisation or minimisation problem. It turns out that these two problems are in fact two sides of the same coin: finding the minimum of some function is the same as finding the maximum of . As a result we typically define optimisation problems in the form of a minimisation problem.

If you think about minimising a simple function such as , then strictly speaking what you are actually doing is minimising this function over a certain domain. This domain can be as small or as big as you require, in the picture above it is for .

Mathematical optimisation is essentially a more general form of this simple optimisation but over a more complicated domain. Instead of being made up of a single variable our domain is constructed from any number of variables – many real world problems can involve millions of variables.

The minimisation problem is defined by two features: a domain P and a real valued function f. What we wish to do is to find an x contained inside P such that for every y in P, .

In order to find a solution x in P we need to have a definition of P in the first place. In practice P is usually a subset of (points contained within P are vectors made of n real values) and P is further specified by constraints. These constraints exclude some points from the valid solution space.

An example of these constraints on P is illustrated above for the case of two dimensions. After applying the constraints we see that only the points that lie within the blue feasible solution are allowed to be part of P. This feasible region is often thought of as a `convex polyhedron‘ as that is its shape in higher dimensions – assuming that the feasible region is bounded. It is also possible to have constraints of different types such as linear (seen above), quadratic and integer constraints.

In order to be able to say that we have solved a problem we need to be able to meet two conditions:

- Find x in P
- For all y in P,

If we meet both 1 and 2 we can be said to have found an `optimal solution’. If we meet 1 but not 2, then we have found a `feasible solution’. It should be noted that meeting both of these statements is not always easy. In fact depending on what P and f are like `proving optimality’ is often much harder.

Usually modern computers can use powerful optimisation solvers that can solve most problems to optimality in a short amount of time. There are however some situations where the problem might be so large or complicated that you can’t determine the optimal solution within a useful timeframe. For example a business that is trying to solve a scheduling problem for jobs to be undertaken later that day cannot be waiting until the afternoon to find a solution.

This motivates another useful concept in optimisation that of being able to find and determine the quality of a feasible solution. In the business situation you can probably find a feasible solution x in P, but you do not have enough time to prove the second statement, that all other y in P have .

If we wish to find a sub-optimal solution to the problem then ideally we need to have some way of measuring the quality of this solution. In essence this is a measure of how close the objective function value is to the optimal objective function value. The way that this is done is by finding a lower bound on the objective function L, which the objective function cannot ever drop below.

for all y in P,

Normally this lower bound is found by solving a relaxed version of the problem (setting an integer constraint to be continuous) or during solver methods such as branch and bound. Once this L has been identified then we can figure out how close the current solution x is to it. The difference between the objective function values and L is known as the `optimality gap’.

absolute gap =

Often this absolute value of the gap is specified in relative terms so that you can compare the percentage difference between your best solution and the lower bound. With this knowledge it is then possible to specify how close to optimality your solutions are. This means that you requests for solutions found with an optimality gap of less than 5% in 3 hours can be dealt with.

relative gap =

Clearly the case where the gap is zero means that you have found the optimal solution. Obviously the smaller the gap the better your solution will be however it is likely that it will take increasingly long amounts of time to reduce the gap. For example it will take longer to go from a gap of 5% to 4% than from 10% to 5%.

]]>Realistically however we know that machines do in fact breakdown and companies employ teams of people whose job it is to maintain the machines so that they can keep them running or repair them if they do break. Generally these maintenance teams will have a planned schedule of times when they will stop the machine in order to do their necessary checks and servicing.

This approach works well except for the fact that there are often conflicting views between the maintenance department and the production department. The production department wants to make sure that products are always being made (as few stops as possible), whereas the maintenance viewpoint is that the aim should be to stop the machine breaking down (needing a lot of stops). This poses the question of how to find the optimum balance between having a reliable machine and a productive machine?

There is also a need to take into account the fact that all the components in a machine will a general degradation over time. If a component has been used for too long past its lifetime then the chance of it breaking will be very large. Since a break down is much more costly to a company than any maintenance activity, we want to perform additional maintenance based on the lifetime of the components rather than someone’s opinion.

This is better known as ‘just-in-time‘ maintenance, where we act on the machine just before it breaks down due to wear and tear. One way of doing this that causes minimal disruption to the planned schedule is to turn one of the scheduled maintenance activities into this just-in-time maintenance.

The aim is then to be able to determine the last possible stop during which this just-in-time maintenance can be performed, before the machine will break down due to general wear and tear. This links into the mathematics of optimal stopping theory.

One clever way to solve this problem of optimal stopping is by using the Bruss or Odds-Algorithm. The key point about the Odds-Algorithm is that it has been shown to actually be optimal and it produces a win probability that is greater than or equal to the well known lower bound of 1/e = 36.8%.

The algorithm starts by looking at a series of n independent events and associates with it another set of variables . These are binary and represent whether the event is something we would like to stop on (=1) or not (=0).

The algorithm starts by summing up the odds in reverse order,

until the sum is equal to or greater than 1 for the first time. If this happens at an index *s*, we save the number s and the sum above as ,

If for some reason the sum of the odds doesn’t get to 1, we take s=1. In addition we also compute,

Then the results of this algorithm are s, the stopping threshold and the probability of meeting your aim.

In a more mathematical form the answer to the problem is given by,

A simple example might be the following. Suppose we roll are prepared to roll a dice 25 times, but we want to stop on a number 5. Then using the Odds-Algorithm we can find that,

Then we see that the required reverse sum to 1 is , so we find that s=21. Then the probability of actually ending on a 5 using this strategy is .

It turns out that we can actually use the odds-algorithm to solve the maintenance problem we described earlier. In our case the n independent events are the times when production is stopped and the variables represent whether the just-in-time maintenance can be performed at that particular stop.

The probability p_i of whether the just-in-time maintenance can be done might be represented through a reliability term and a maintenance term. This reliability function X of the machine depends on the starting time a_i of the production stop. The longer the production stop is into the operating cycle, the less likely that the machine has not yet broken down. The maintenance function M might will probably depend on the length of the stoppage of production d_i. The longer the production stop the more likely that the maintenance can happen during it, so an exponential distribution might be appropriate.

Then substituting into the formula used by the odds algorithm we can find the best stop to do just-in-time maintenance at and the chance that this stop will turn out to have been optimal in hindsight.

Reference article: Odds Algorithm’-based Opportunistic Maintenance Task Execution for Preserving Product Conditions

]]>

Let’s start by making a substitution in the formula of .

With a bit of mathematical manipulation we can recast this into a more familiar form.

These look very similar to the form of a logistic regression. This is a regression where the dependent variable can only take two values – it is binary. In our case we only have the outcomes of team i winning or team i losing.

Then if we substitute our initial expression into the logistic transformation we obtain the following terms which can be furthered simplified using the fact that .

From here it is simple to invert the transformation to get the final result, which is that the probability of team i beating team j is just a logistic regression on .

Let’s consider the case of making an alternative substitution of . This roughly corresponds in a physical sense to the idea that the skill of a player is best represented by some other underlying measure c.

At this point it is clear that we have come across this formula before. If we let be runs scored by team i and be runs scored by team j (which is the same as runs conceded by team i), we have obtained the classic Pythagorean win probability for a single game. It also fits into the logistic regression model if we change to !

Alternatively if we took a slightly more general substitution of we end up with the two exponent form of Pythagorean expectation. We previously saw this back in the form of the Pythagorean exponent I used for Premier League football.

Clearly there is more to the idea of Pythagorean wins than just a magic formula that seems to work! In fact it can be derived not just from a basic Bradley-Terry model but from another idea: that the scoring of runs or points is described by a Weibull distribution.

ELO ratings are now a well-known method of comparing players and teams performance with their competitors over a period of time. The principle is that each person has their own rating which is affected by the games played by that person against other players. The basic idea is that the bigger the difference between two ranked players, the more likely the higher ranked player is to win.

The appeal of this type of system is that it is in some sense self-correcting: if the underdog team wins it gains more points from its win than if the pre-match favored team were to win. This means that even if the model wrongly predicts the winner it will adjust the ratings to favour the actual winner. As a result the model should take into account the fact that skill changes over time – it is dynamic.

The probability of a team winning depends on the difference in ratings of the two teams playing.

Once more this is nothing more than a slightly readjusted form of a Bradley-Terry model. In this case we have used a slightly more abstract substitution of , where is the ELO rating of team i.

Once the match is played the model updates the ratings of the two teams that played. This is done by a simple linear formula which awards more points to a winning team if it was less favoured to win. In other words it discounts the ratings gain when the favourites win and increases the ratings gain when the underdogs win. We do this for each team i involved,

Here the term P is just the result of the match in terms on a scale of 0-1. So if it was a binary game of chess a win would give P=1 and a loss P=0. The difference between the predicted win percentage for team i and the outcome is then scaled by the constant term K to produce the update to the current rating of the team i.

It is this constant K that is the chief parameter of ELO models. A value that is too high will end up causing massive swings in ratings from only a few games, whereas a low value will not capture a teams change in performance quickly enough. The way that the model is designed means that it is not necessary to have a single value of K for all participants or matches. Often new players might have a slightly higher K factor to help us learn their true rating more quickly and more important matches such as World Cup finals will carry a greater weighting.

In fact ELO ratings and their successor ratings such as GLICKO have been applied to many other sports in addition to chess. By starting all teams with a particular rating you can then calculate updated ratings and odds of winning matches between not just two current teams but any other historical teams as well. This provides a framework to start examining questions like who is the best team of all time and which teams beat their expectations the most.

]]>

In many sports that are played in front of a partisan crowd there is often a benefit to the team being supported. This is the concept of “Home Advantage“, where the local team tends to perform better than the visiting team. This is not just because of crowd support, it might also include the fact that the home team are more experienced at playing in those conditions – think of the England’s cricketing struggles in the Indian subcontinient!

A mathematical form for this was suggested by Agresti (1990) which is given below,

Here the parameter represents the size of the home-field advantage, the larger the value the more likely the home team wins.

A key drawback to the use of the basic Bradley-Terry model is that it doesn’t account for the chance of draws. This is traditionally done by adjusting the model with an extra threshold parameter to controls the likelihood of ties. One such model is that of Rao and Kuppler (1967),

Another way of including ties in the model was suggested by Davidson (1970).

In this instance the probability of a draw is proportional to the geometric mean of the skill parameters of the two players. If the value of , then there is no chance of draws and we reproduce the basic Bradley-Terry model.

One generalisation of the basic model is to allow for the case of having a single winner in a game that has more than two players. This could be the case of a family playing a board game or a finding the best chocolate out of a selection box.

You can even model games made up of different teams of different sizes playing against each other by building the information about each players individual skill into the model. This could be used for team sports such as doubles tennis or track cycling.

Obviously this approach does have a fairly large assumption built into it: that the quality of a team is essentially just a combination of its parts. Some sports might work well using this such as baseball, where each individuals contribution to the team is pretty much independent of his team-mates. I would argue that more fluid games such as football are better summarised as: you’re only as good as you’re weakest link. If you compare a pair of “average” centre midfielders and a pair made up of a”star” and a “rubbish” centre midfielder, then the second pair will probably be worse. Remember team-work and cohesion plays an important part in football.

It is even possible to account for comparisons and rankings between more than two players in a game. This is better known as a Plackett-Luce model. If we have three players then the model becomes,

You can think about this with the analogy of picking balls out of a vase. Imagine there are three colours of balls: red, green and blue. Suppose that the proportion of balls in this infinite vase is which will basically be our skill parameters. Then suppose that we want the probability of some order of balls being drawn, say green-red-blue. Remembering that we need to resample if we get a colour that we previously had, we can write the probability of this ordering as,

This takes the form of the general Plackett-Luce model for comparisons seen above.

]]>This idea is formally known as a Bradley-Terry model (1952), where the chance of Alice or Bob winning are in proportion to their skill levels. If Alice has a skill level of and Bob has a skill level of , then the probability that either one wins is the ratio of their skill level to their combined total skill level.

It is clear to see that the law of total probability holds, if we sum up the probabilities of both outcomes we get unity and the basic idea of winning in proportion to the “skill” or “quality” is also obeyed.

We can then calculate the odds of each person winning by looking at the ratio of their probabilities to win,

In many situations it is hard to come up with some ranking of all the current players directly (think of a playing a computer game online), but through the Bradley-Terry model it is possible to compare pairs of players through their individual match-ups to generate full rankings.

We can calculate these skill ratings and hence rank teams according to their skill by using the technique of Maximum Likelihood Estimation. This turns out to be simple to do mathematically and yields an answer that makes sense. Let’s assume that we have n players and a skill parameter for each of them, . The only data we need is the number of matches that player i beat player j in, . Then we can write the likelihood as,

where the sum is over all the i,j=1,…,n except for i=j. For example with just Alice and Bob this would become,

The general log-likelihood takes the form,

At this point we should also point out that the values of that maximises this function arre not unique. We can see this easily from the log-likelihood above as , where a>0. In order to get a unique maximum likelihood we need a unique maximum set of p_i’s, so we set . This will be necessary as algorithms to find the MLE such as the Expectation-Maximisation (EM) algorithm only converge to a local maxima: if there is only one maxima it is guaranteed to be global.

We can then denote the number of matches that i wins against all other players as . Also we see that the number of comparisons between i and j is , so we can further simplify the log-likelihood down to,

Then we can take the derivative with respect to to yield,

Finally we can then rearrange the equation for to get the MLE,

These form the maximum likelihood estimates for our skill parameters and by the properties of MLE are guaranteed to eventually converge as we add more and more data. From then on it is a trivial task to create rankings, predict future win probabilities or even calculate simple odds.

I will now do a simple example based on three people playing in a triangular tournament of some skill based game. During the tournament we kept track of the results of all the matches played and then calculated our skill parameters once we had finished playing.

It should be said that the more data (previous match-ups) that you have the more accurate the resulting skill parameters should be, so there is still time for Charlie to turn it around if we play again!

]]>An important concept in this case is the fact that during the algorithm we often violate mass-balance constraints. In other words we can have more flow going into a particular node than leaves it. The difference between the flow in and flow out is referred to as the `excess’. In order to find a feasible solution we need all the nodes apart from the source or sink to have a total excess of zero. We remove this excess by `pushing’ flow to adjacent nodes.

There is a useful analogy for the preflow-push algorithm in terms of water trying to flow downhill to a destination.

- We move the source uphill so that water flows from it towards the downhill nodes.
- In general water flows downhill towards the sink.
- Sometimes flow becomes trapped at a node that has no downhill neighbours.
- We move this node higher and water continues to flow downhill to the sink.
- Eventually no more flow can reach the sink.
- As we continue to move nodes higher the remaining flow returns to the source.
- The algorithm terminates when all the flow is at the source or the sink.

Note that in the example above, the steps correspond to what we will do next. This action has been completed on the slide.

If you were paying attention to the simple example above you might have noticed that we seemingly picked at random which node to select when we had multiple nodes containing an excess. This was exactly the case; the above example just uses the generic preflow-push algorithm which is about as good as the best augmenting path algorithms.

The attraction of the preflow-push algorithms is that they can be significantly improved by incorporating clever rules for deciding which nodes to pick. These tend to be heuristics and are briefly described below.

- Generic preflow-push algorithm

This is the standard algorithm for preflow-push which forms the basis for most subsequent modifications.

- FIFO preflow-push algorithm

This modification of the generic algorithm examines active nodes (nodes with an excess) in the order of first-in-first-out.

- Highest-label preflow-push algorithm

This version is probably the most efficient algorithm in practice and decides to select the node with the highest excess when a choice has to be made.

]]>The augmenting path algorithm utilises ideas in capacitated graphs called the residual capacity and the residual flow. This is the potential for flow to move along particular arcs. It can be summed up as Residual Flow = Capacity – Forwards Flow + Backwards Flow. See the diagram below to see how it works (the first number is the flow, the second the capacity).

The basic idea in the augmenting path algorithm is to search for possible paths from the source node to the sink node in the graph which can carry a positive flow. This is known as an `augmenting path‘. Once we have found a path we send the maximum flow that we can along the path and then update the capacities on each arc in that path. This is where the need for residual capacities comes in. We then look for another augmenting path in the updated network and repeat the process. When no more augmenting paths can be found in the network the algorithm terminates.

- Labelling algorithm

This is the simplest form of the augmenting path method which is simple to implement, but it is not very efficient in practice.

- Capacity scaling algorithm

This is a special implementation of the algorithm and chooses to send flow along paths starting with paths with the largest residual capacity.

- Successive-shortest path algorithm

This version is the best augmenting path algorithm and sends flow along the shortest directed paths first using the distance nodes as seen in the example above.

Augmenting path algorithms tend to be computationally expensive. As they send flow down paths we have to do an operation to move flow in each arc of the path. This means that there are some examples where we have to do an extremely large number of computations.

In the picture above we have a long sequence of high capacity arcs followed by a sets of arcs of capacity of unit 1. As a result there are clearly 5 augmenting paths through the network.

The augmenting path algorithm will then analyse the 5 paths and end up sending 1 unit flow along each of them. This reaches the correct answer but will require a lot of computations. It needs to consider 5 paths of length 7, so we have a total of 5*7=35 computations.

It would be far more efficient if we noticed that we only really need to look at the end paths with unit capacities. Whilst we need to analyse the flow along each of these short paths, we could cut out the need to analyse the paths from the start by sending all the flow we can at the start. As a result we could end up needing only 5+5*2=15 computations. This is the approach taken by pre-flow push algorithms.

]]>One interesting problem which can be solved using the ideas of network flows is that of the baseball-elimination problem. Baseball in the USA consists of two different leagues which are each split up into three divisions of about 5 teams. During regular season each team plays 162 games in total, most of which are against teams in their division (76 vs same division, 66 vs same league, 20 vs different league). The aim of this is try and qualify for the playoffs, which can be done by finishing first in the division.

All this adds up to a desire to be able to work out whether it is possible to see if a particular team can still potentially win the division and make it to the playoffs given the current league standings. Solving this problem means that we have to look at tables that might look something like the one below.

In this simple example it is clear to see that Houston are already eliminated, the maximum number of wins they can get is 77+4=81 which is less than the 82 wins that LA already have. However the cases of the remaining three teams are all intertwined. Take Texas for example, if they win there three games to play the can reach 83 wins and overtake LA. However to win the league they would LA to lose 8 out of their 9 remaining games. In this scenario 1 loss vs Texas, 2 vs Houston still leaves 5 losses needed out of 6 games against Oakland. However if this happens Oakland will themselves reach 83 wins and claim first place! Clearly it is not enough to look at a team’s wins and games left to play. We need to consider who those games are played against.

Let’s assume that team i has already won games that it has played with games to play and that is the number of games between teams i and j that have yet to be played. We shall also set , so that is the maximum number of victories that team can obtain.

Using this notation we cannot eliminate team if in some outcome of the remaining games to be played throughout the league, is at least as large as the possible victories of every other team.

- The basic idea is to assume that the team wins all their remaining games ⇒ wins
- Use network flows to try to share out the remaining games so all the other teams have wins

The first step in building the network is to create nodes for each of the remaining game combinations called game nodes in the example below. There is an arc connecting the source to each game node separately. The capacities of each of these arcs are the number of games left between the two teams represented by the game node that the arc is connected to. This is shown in the graph as , the number of games to still be played between teams 2 and team 4.

We also require a set of nodes in the network that represent the teams known as the team nodes. These team nodes are fed into by arcs from each game node that involves that particular team. The capacities of these arcs are usually modelled as infinite, since the number of additional games won is only restricted by the number of games left to play.

Finally there is a sink node that completes the graph by being fed into by arcs from all of the team nodes. The capacities of these arcs are the difference between the potential total number of games won by team 3 and the number of games won by team i. In other words these are the number of games that teams can still win whilst remaining with less wins overall than team 3.

The flow in this model between the source node s and the team node i is the total number of additional games that team i wins. In the problem we can eliminate a team if the network contains a feasible flow x satisfying the conditions for all other teams i,

We can rewrite this equation in our case to see it matches the capacities in the diagram,

The way to find out whether team 3 is eliminated or not is to find out if all the games can flow through the network. This is a maximum flow problem: if the max flow saturates (equals the capacities of) all the arcs leaving the source the team is not eliminated, but if there is no feasible flow then team 3 is sadly eliminated.

A useful link if you’re interested in extensions and other facts about this problem is from Kevin Wayne here and some lecture notes from Cornell here. I also got my first book out of the library in several years, Network Flows: Theory, Algorithms and Applications by Ahuja, Magnati and Orlin. This has a section on the baseball-elimination problem and is more generally known as one of the best textbooks on the subject of network flows.

]]>This is the league table from earlier in the year.

Below is the current league table I’m working from with about 5 or 6 games to go.

Observations

- Leicester seem to be unstoppable in real life and despite my prediction of them hitting a blip nothing of the sort has happened to them yet. This is reflected in them having out performed the average teams points total for their goals scored and conceded by a whopping 9.29 pts. This has almost doubled in the last 10 games (they have only lost 1 league game in the last 10 and that was by a single goal)!

- The other prediction I made was that Tottenham seemed to be under-achieving earlier in the season and that they could challenge for the title. Despite not substantially improving their residual -5.64 to -4.81, they have maintained their good run of form and look like the most likely title challengers to Leicester.

- Other significant improvements have been made by the likes of Bournemouth and Stoke who have turned their form around to pick up more points than they might have expected to.

- Everton continue to be having a shocking season based on pythagorean expectation which matches the conventional wisdom about them. In a more typical season based on their goals scored and conceded they should have almost 9 more points.

- Finally despite my current worries about Watford the maths seems to suggest that they will still be fine in the league this year.

]]>

There are two main strands of statistics, Classical statistics and Bayesian statistics. These aren’t necessarily conflicting ideologies, though many statisticians throughout history would beg to differ, but are simply two different ways to tackle a problem. Hopefully this post will give you some brief insight into the uses and differences of the two approaches.

Classical statistics is the first type of statistics that people come across and is to do with what we expect to happen in a repeatable experiment. This might be the idea that if we flip a coin an infinite number of times the proportion of heads we obtain is a half. Hence we get the well known probability of a head as a 1/2. This is why classical statistics is often known as frequentist and covers ideas such as confidence intervals and p-values.

Bayesian statistics evolved out of Bayes’ Theorem which I talked about in a previous post.

- P(A), P(B) are known as prior probabilities, because we know them before we learn any more information.
- P(A|C), P(B|C) are known as posterior probabilities, because they are found after we have learnt some additional information.

You can think of this Bayesian statistics as an evolution of Bayes’ Theorem. Instead of dealing with point probabilities we now deal with probability distributions. As a result we now have prior and posterior distributions to consider.

As the term is just a normalising constant we can drop it to get the commonly seen Bayes’ Rule.

Here is the posterior distribution, is the likelihood which accounts for the statistical model and is the prior which represents the expert beliefs before seeing the data. The key point is that the Bayesian approach can quantify theories and hypotheses, something that can be desirable.

The comic above talks about an object that can measure neutrinos. Neutrinos are sub-atomic particles which can travel great distances and pass through matter unaffected. The detector is set up to tell lies or the truth regardless of whether neutrinos coming from a supernova were detected.

The frequentist statistician is making a mistake in his analysis. Normally most studies work on the assumption that something can be deemed to be true if there is a less than 5% chance that the result itself was down to chance. This is better known as the need to disprove the null hypothesis.

As the chance of the two die both coming up as 6’s is 1/36 (about 0.028) which is less than the 5% threshold, the frequentist decides that the detector must be telling the truth. The bayesian gives a different answer: that there is not a supernova. This will turn out to be a more appropriate answer. Because a a supernova isn’t really a repeatable event, we should use our beliefs about them to help find an answer!

Most models of the lifetime of the Sun say that it can be expected to live for around 10 billion years. If the experiment is run once an hour every day (24 times), we get the probability of the sun exploding as 1 in 8760 billion or 1 in . Then using the information in the cartoon there are two possibilities when the detector reports an explosion.

- The sun
**has**exploded (one in 8.76×10^{13}) and the detector**is**telling the truth (35 in 36). This event has a total probability of about 1/(8.76×10^{13}) × 35/36 or about 1/9.01×10^{13}. - The sun
**hasn’t**exploded (8.76×10^{13}− 1 in 8.76×10^{13}) and the detector**is not**telling the truth (1 in 36). This event has a total probability of about (8.76×10^{13}− 1) / 8.76×10^{13}× 1/36 or about 1/36.

This makes it pretty clear that the Sun isn’t likely to have exploded.

Clearly if the Sun had happened to have gone supernova, the Earth will be wiped out pretty quickly. This would make the money lost in the bet would be worth absolutely nothing, highlighting the need to check your working.

Finally the physicist inside me feels it necessary to mention that the Sun can’t go supernova: it is not nearly massive enough. So perhaps the true meaning of the comic is that physics is more important …

]]>