# Dijkstra’s Algorithm

I previously talked about a particular instance of the vehicle routing problem for electric vehicles in a post which you can read here.

In fact that was a more challenging version of a classic problem in combinatorial optimisation – finding an optimal path. So it would be remiss of me if I didn’t go right back to the start: Dijkstra’s algorithm.

# Input Modelling – Simulation

One of the more interesting topics that we looked at over last week was to do with input uncertainty in simulations. Generally in simulation we have a set of inputs into our model, which often take the form of some probability distribution. For example when we are trying to simulate queues, the arrival of customers into the system would be one such input.

Often we don’t need to know exactly what distribution should have been used as an input, since what we actually care about are the performance measures of the system. Thinking back to the queuing example, we care about facts like the expected number of people in the system and the fraction of time that the servers are working (server utilisation). It turns out that as long as we match enough good properties of the distribution, we can get the correct performance measures!

This is more easily seen in an example such as the M/G/1 queue, which has been solved analytically (mathematically). The mean length of the M/G/1 queue is given by the Pollaczek–Khinchine formula below.

$E(Y) = \frac{\lambda(\sigma^2+\tau^2)}{2(1-\lambda\tau)}$

As you can see the distribution for the time taken to serve a customer doesn’t matter itself, only the mean and variance of it does! Luckily this is often the case. Note that we also have lambda which comes from the fact that arrivals into the system follow a Poisson distribution, which is important.

So how do we find the properties of a distribution if we can’t use that distribution itself?

One way to tackle this problem is to use the idea of matching central moments. Here we want to try and mimic key properties of the distribution, so we can simulate new input data from it.  The basic idea is to pick parameters for $F(x;\theta)$ (the true distribution) that match the properties of the real world data $X$.

Mean: $\mu_{X} = E[X]$

Variance: $\sigma^2_{X} = E[(X-\mu_{X})^2]$

Skewness: $\alpha_{3} = E[(X-\mu_{X})^3]/\sigma^3_{X}$

Kurtosis: $\alpha_{4} = E[(X-\mu_{X})^4]/\sigma^4_{X}$

Suppose that we indeed do have the first four central moments of our distribution, $(\mu_{X}$$\sigma^2_{X}$, $\alpha_{3}, \alpha_{4})$ and they all exist. Then there is a fairly standard trick that lets us match the mean and variance easily,

$X' = \mu + \sigma(\frac{X-\mu_{X}}{\sigma_{X}})$

The formula gives us an X’ with mean $\mu$ and variance $\sigma^2$, but the same skewness and kurtosis as X. As a result it is not the matching of the first two central moments that poses a problem, but the matching of the next two!

Aside

# Routing Electric Vehicles

Electric cars seem to be the way of the future, they are cheaper to run and better for the environment. However there has always been one thing holding them back in many people’s eyes: actually finding somewhere to recharge and the length of time needed to recharge.

Many companies continue to try and reduce the time needed to fully recharge a car battery, with a new generation of fast charging stations being able to recharge significantly a battery in around 30 mins. However one approach suggested by some companies is to simply swap batteries depleted with fresh ones at stations.

One interesting talk given at that the recent STOR-i conference was presented by Pitu Mirchandani from Arizona State University in the United States. His work has been directed towards electric cars and the somewhat unique problems that crop up for the battery exchange models in particular.

He defined a new problem as the ‘Electric Vehicle Shortest Walk Problem’. Note that it is shortest walk rather than shortest path, as it takes into account the fact that vehicles will often need to take detours in order to recharge or swap batteries. This mean that the problem he had to deal with couldn’t be solved with traditional shortest path algorithms such as Dijkstra’s algorithm and he had to develop a new approach.

The two main things that need to be minimised are:

1. The total distance detoured from the optimal route (the route with no stops)
2. The total number of stops (to exchange batteries)

He went on to discuss further his ideas of other interesting problems that can be studied such as where to locate the battery exchange stations and how large they should be. This meant that he was able to propose a system that could decide where every car in the network should stop off on its journey to swap batteries. The point of doing this was that he found the overall total journey time of all the cars in the system was minimised. In particular it was substantially below the case when every driver made their own decision about where and when they chose to stop. So whilst drivers would lose their independence, all road users would benefit.

I should probably conclude by saying that nothing is the world of research is guaranteed. In fact it turns out that the man behind Tesla, Elon Musk, has since changed his mind on battery exchange technology following an unsuccessful pilot. He suggested that people just prefer being able to recharge their car! This appears to be for several reasons, not least of which that recharging at their solar powered superchargers is free and using a battery exchange station costs the same as a fresh tank of petrol!

# An Introduction to Changepoints

As you probably know Statistics is often used to try and analyse data. There are all different types of data, but the data we are interested in studying in this post is that of time-series data. This is exactly what it sounds like: information collected about some process over a time period. This time period can range from the small scale of seconds for signal processing all the way to larger scales such as years as seen in financial data.When we look at time-series data we often see sudden changes in the pattern of the data. We use the term ‘changepoints’ to describe the places where this occurs. In a more mathematical sense changepoints tend to happen when there is some change in the parameters of the data, i.e in  the mean or variance of the series. Sometimes changepoints even occur when more than one parameter of the data changes.The time-series data that changepoint analysis is used for crops up in many different disciplines. In finance it is needed to keep track of volatility in the stock market, whilst climatology harnesses it to detect changes in the mean temperature of the planet. Even new fitness technology like activity trackers make use of it. In fact just about anything that has some variation over time could have changepoint analysis applied to it, making it worthwhile and active area of statistical research.

Stéphane Robin from AgroTech Paris recently gave a presentation at the recent STOR-i Conference where he talked about the role of changepoints in genomics. This is a sub-area of genetics to do with measuring data about DNA and RNA which is found along the chromosomes. It turns out that collecting this data along the genome is akin to a time-series, since there is so much of it! Particular experiments often aim to find regions of the genome where a specific event occurs, which makes changepoint analysis an incredibly useful asset to genomics.

This slideshow requires JavaScript.

The slides above illustrate some of the data that changepoint problems have to frequently solve in genomics.