# Simulation: Input-Uncertainty

The idea behind simulation models is that we can use them to make predictions of what might happen in the real world. As we want often want to use the results of these simulations to inform decision making it is important for the results to be accurate and precise. The way we do this is by keeping a careful eye on the errors.

Traditionally people have focused on trying to minimise the error that crops up in the actual simulation itself, the technical name for this is the ‘simulation-estimation error.

However there is another important error that is often overlooked and needs to be taken into account as well, the ‘input-uncertainty error‘. This is to do with the uncertainty in the actual values you plug into your simulation to start with.

Errors in Simulations; source: emilystori.wordpress.com

Video

# Changepoint Detection – Part 2

If you haven’t already, I would strongly recommend glancing over the part 1 of this post before continuing. In it we talked about a method for detecting a single changepoint through techniques such as maximum log-likelihood estimation. The question now is the more difficult one of how can we search for multiple changepoints?

The simplest thing to do would be to initially apply our Single Changepoint Detection (SCD) method to the series we want to analyse. If there is no changepoint found we are done, but if there is a changepoint found we end up with two distinct segments.

# Changepoint Detection – Part 1

Previously we discussed what a changepoint is, how they work and examples of where you might find them. Today however we shall go one step further and attempt to discuss solving the problem of how to find these changepoints.

Multiple changepoints have be found, they are the red lines.

Let’s begin by trying to figure out a way of detecting if just a single changepoint exists in a set of data. The simplest way to pose this problem is to describe it through a ‘Hypothesis Test’, where we have a null hypothesis ($H_0$) and an alternative hypothesis ($H_1$),

$H_0:$ no changepoint            $H_1:$ 1 changepoint

If you are unfamiliar with hypothesis tests, the idea is to run some kind of test that will give us a value. If that value is above a certain threshold, we can reject the null hypothesis and in our case accept the alternative hypothesis. If not, we can’t disprove the null hypothesis and we accept it as true. In layman’s terms, it’s the classic innocent until proven guilty. Alternatively check it out yourself!

Aside

# Markov Chains and Tennis

Now that the tennis season is well underway and the Australian Open is already a few weeks in the past, I thought I would look at a basic model of a game of tennis for today’s blog post. As you might expect there will be some maths later on, but feel free to skip that and just look at the pictures and the results!

The simplest way to model a tennis match as suggested by people such as O’Malley, is to look at the probability of players winning a point on their own serve. Then we can define the probability of a player winning a point on their serve as p, and the probability of losing a point on their serve as 1-p. These two probabilities can be used to build up a model of the entire match. Whilst this is clearly a simplification it has been shown to be a pretty good predictor of the overall winner of a match.

For example by checking the ATP website we can see that Roger Federer won 3737 out of 5202pts he played on serve last year, giving him an average probability of p=0.718 to win a point on his serve. Giles Simon only managed to win 3186 out of 5128pts on his serve and consequently had a more typical value of probability p=0.621 to win a game on his serve. Most top-50 ATP players had values in the range of 0.6 to 0.7 for the 2015 season, which is obviously pretty good!

# Emergency Planning with OR

One recent talk that stood out to me was given by Marc Goerigk and talked about how operational research can be used to help improve evacuation planning for emergencies. For example unexploded WWII era bombs, Lancaster floods or terrorist incidents.

The ideas I will talk about are to do with using an optimisation approach to give a good lower bound on the evacuation time. Simulation techniques on the other hand are more useful for finding an upper bound on the evacuation time.

A clever thing to do when we are trying to model real world places is to try and represent the place in a network. Essentially we can take a map and represent it in a graph network using nodes and arcs. Arcs generally represent paths or roads and nodes the intersection of these paths and roads. This allows us to simplify tricky features such as curved roads into nice straight lines.

Transforming a map of my flat into a network diagram

# Multi-Armed Bandits

One of the classic problems in Statistics that a lot of people working at STOR-i in particular are involved with is something called ‘Multi-Armed Bandits’. In fact there was a recent conference on the topic and its applications at Lancaster earlier this year. In this post I will try to explain what the problem is and touch on how it is solved.

Originally this problem was first thought about in the casinos of Las Vegas. It concerns how to pick which slot machine out of a group you should play to try in order to try and maximise your total winnings. Whilst this problem is no longer just applicable to casinos, the name ‘one-armed bandit‘ has stuck and the casino analogy is what I will continue to use.

# Pythagorean Expectation in Football

The use of data and statistics has grown rapidly in most sports in recent years. Nowadays many metrics and formulas exist for trying to measure and predict performances of players and teams. One of my favourite tools because of its simplicity is something called ‘Pythagorean Expectation‘.

The most famous early pioneer of baseball statistics, Bill James, came up with the original formula to predict the number of wins for a team over a baseball season, based on the number of runs they scored and conceded.

### $Win\% = \cfrac{RS^2}{RS^2 +RC^2}$

To work out the number of wins in a season you simply have to multiply the win percentage by the number of games played. The reason James called it Pythagorean is because of the occurrence of all the squared terms. Whoever said Pythagoras Theorem was boring!