Image

# New STOR-i Space & Posters

We recently had the grand opening of the new STOR-i building on campus so we are no longer borrowing space from other departments, but finally have our own STOR-i exclusive space.

We also displayed the recent posters we made from the previous simulation masterclass given by Barry Nelson in the new MRes baseroom. The one I worked on with Emily, Anna and Rob is below and is titled “Simulation Confidence Intervals for Input Uncertainty”.

My fellow MRes student Harjit, looking at the posters. Harjit’s blog.

# Simulation: Input-Uncertainty

The idea behind simulation models is that we can use them to make predictions of what might happen in the real world. As we want often want to use the results of these simulations to inform decision making it is important for the results to be accurate and precise. The way we do this is by keeping a careful eye on the errors.

Traditionally people have focused on trying to minimise the error that crops up in the actual simulation itself, the technical name for this is the ‘simulation-estimation error.

However there is another important error that is often overlooked and needs to be taken into account as well, the ‘input-uncertainty error‘. This is to do with the uncertainty in the actual values you plug into your simulation to start with.

Errors in Simulations; source: emilystori.wordpress.com

# Input Modelling – Simulation

One of the more interesting topics that we looked at over last week was to do with input uncertainty in simulations. Generally in simulation we have a set of inputs into our model, which often take the form of some probability distribution. For example when we are trying to simulate queues, the arrival of customers into the system would be one such input.

Often we don’t need to know exactly what distribution should have been used as an input, since what we actually care about are the performance measures of the system. Thinking back to the queuing example, we care about facts like the expected number of people in the system and the fraction of time that the servers are working (server utilisation). It turns out that as long as we match enough good properties of the distribution, we can get the correct performance measures!

This is more easily seen in an example such as the M/G/1 queue, which has been solved analytically (mathematically). The mean length of the M/G/1 queue is given by the Pollaczek–Khinchine formula below.

$E(Y) = \frac{\lambda(\sigma^2+\tau^2)}{2(1-\lambda\tau)}$

As you can see the distribution for the time taken to serve a customer doesn’t matter itself, only the mean and variance of it does! Luckily this is often the case. Note that we also have lambda which comes from the fact that arrivals into the system follow a Poisson distribution, which is important.

So how do we find the properties of a distribution if we can’t use that distribution itself?

One way to tackle this problem is to use the idea of matching central moments. Here we want to try and mimic key properties of the distribution, so we can simulate new input data from it.  The basic idea is to pick parameters for $F(x;\theta)$ (the true distribution) that match the properties of the real world data $X$.

Mean: $\mu_{X} = E[X]$

Variance: $\sigma^2_{X} = E[(X-\mu_{X})^2]$

Skewness: $\alpha_{3} = E[(X-\mu_{X})^3]/\sigma^3_{X}$

Kurtosis: $\alpha_{4} = E[(X-\mu_{X})^4]/\sigma^4_{X}$

Suppose that we indeed do have the first four central moments of our distribution, $(\mu_{X}$$\sigma^2_{X}$, $\alpha_{3}, \alpha_{4})$ and they all exist. Then there is a fairly standard trick that lets us match the mean and variance easily,

$X' = \mu + \sigma(\frac{X-\mu_{X}}{\sigma_{X}})$

The formula gives us an X’ with mean $\mu$ and variance $\sigma^2$, but the same skewness and kurtosis as X. As a result it is not the matching of the first two central moments that poses a problem, but the matching of the next two!