Input Modelling – Simulation

One of the more interesting topics that we looked at over last week was to do with input uncertainty in simulations. Generally in simulation we have a set of inputs into our model, which often take the form of some probability distribution. For example when we are trying to simulate queues, the arrival of customers into the system would be one such input.

Often we don’t need to know exactly what distribution should have been used as an input, since what we actually care about are the performance measures of the system. Thinking back to the queuing example, we care about facts like the expected number of people in the system and the fraction of time that the servers are working (server utilisation). It turns out that as long as we match enough good properties of the distribution, we can get the correct performance measures!

This is more easily seen in an example such as the M/G/1 queue, which has been solved analytically (mathematically). The mean length of the M/G/1 queue is given by the Pollaczek–Khinchine formula below.

E(Y) = \frac{\lambda(\sigma^2+\tau^2)}{2(1-\lambda\tau)}

As you can see the distribution for the time taken to serve a customer doesn’t matter itself, only the mean and variance of it does! Luckily this is often the case. Note that we also have lambda which comes from the fact that arrivals into the system follow a Poisson distribution, which is important.

So how do we find the properties of a distribution if we can’t use that distribution itself?

One way to tackle this problem is to use the idea of matching central moments. Here we want to try and mimic key properties of the distribution, so we can simulate new input data from it.  The basic idea is to pick parameters for F(x;\theta) (the true distribution) that match the properties of the real world data X.

Mean: \mu_{X} = E[X]

Variance: \sigma^2_{X} = E[(X-\mu_{X})^2]

Skewness: \alpha_{3} = E[(X-\mu_{X})^3]/\sigma^3_{X}

Kurtosis: \alpha_{4} = E[(X-\mu_{X})^4]/\sigma^4_{X}

Suppose that we indeed do have the first four central moments of our distribution, (\mu_{X}\sigma^2_{X}, \alpha_{3}, \alpha_{4}) and they all exist. Then there is a fairly standard trick that lets us match the mean and variance easily,

X' = \mu + \sigma(\frac{X-\mu_{X}}{\sigma_{X}})

The formula gives us an X’ with mean \mu and variance \sigma^2, but the same skewness and kurtosis as X. As a result it is not the matching of the first two central moments that poses a problem, but the matching of the next two!