As you probably know Statistics is often used to try and analyse data. There are all different types of data, but the data we are interested in studying in this post is that of time-series data. This is exactly what it sounds like: information collected about some process over a time period. This time period can range from the small scale of seconds for signal processing all the way to larger scales such as years as seen in financial data.When we look at time-series data we often see sudden changes in the pattern of the data. We use the term ‘changepoints’ to describe the places where this occurs. In a more mathematical sense changepoints tend to happen when there is some change in the parameters of the data, i.e in the mean or variance of the series. Sometimes changepoints even occur when more than one parameter of the data changes.The time-series data that changepoint analysis is used for crops up in many different disciplines. In finance it is needed to keep track of volatility in the stock market, whilst climatology harnesses it to detect changes in the mean temperature of the planet. Even new fitness technology like activity trackers make use of it. In fact just about anything that has some variation over time could have changepoint analysis applied to it, making it worthwhile and active area of statistical research.
Stéphane Robin from AgroTech Paris recently gave a presentation at the recent STOR-i Conference where he talked about the role of changepoints in genomics. This is a sub-area of genetics to do with measuring data about DNA and RNA which is found along the chromosomes. It turns out that collecting this data along the genome is akin to a time-series, since there is so much of it! Particular experiments often aim to find regions of the genome where a specific event occurs, which makes changepoint analysis an incredibly useful asset to genomics.
The slides above illustrate some of the data that changepoint problems have to frequently solve in genomics.