When analyzing a time series data set we sometimes want to detect those points in time where there is a significant and abrupt change.
Aster offers a ChangePointDetection function that does exactly that. The function looks back at the available data points and applies a binary segmentation search method. The algorithm executes these key steps:
Before we can learn more about this function we need a data set to explore. We can download the Online Retail Data Set from the UCI Machine Learning repository (link).
Let's load the csv data into a new Aster table "retail_sales_cpd" and review an example.
Our data set includes 541,909 rows. We pick one sample customer and product:
In the output we see that a customer from the Netherlands tends to place very large orders for vintage spaceboy lunch boxes. The price is very static, except for one order.
The quantity varies wildly. We see significant up and down changes (red boxes throughout the order history.
Of course with large data sets we do not have time to manually sift through the data and create visual plots. Let's review what the ChangePointDetection function can do for us.
Besides the normal function parameters there are a few additional parameters that we need to study more carefully:
We invoke the ChangePointDetection function and use linear regression to perform the segmentation:
Note that while we can use the ACCUMULATE feature to output additional columns, I prefer to join with the source table to get a full picture.
Reviewing our basic line chart again. If we circle higher qty change points in red and lower qty change points in green we get this result:
Obviously the change points do not always correspond with straightforward highs and lows. If they did we would not need to have the function do all the calculations. A simple sql windowing approach could accomplish the same.
Change detection on retail data can highlight those customers that have unique requirements and shopping habits. Possibly this group of customers is at higher risk of churn or lower satisfaction and it is a good idea to perform further analysis using other techniques.
To quickly find those customers of interest and products with a higher number of change points we can aggregate our results.
Since our example is using a retail data set one question comes to mind: does seasonality impact the results? Yes, change detection algorithms do have a harder time with time series that include seasonality. It is recommended to remove the seasonal component if your results are below expectations.
So what is the value then?
In our example we reviewed retail sales using the quantity sold. We can apply the same technique to averages, counts, standard deviations. This opens the door to various use cases such as fraud or intruder and anomaly detection where a tangible corrective action is possible.
Another example could be a rise in call center complaints. A change detection analysis can pinpoint the time where one or more events triggered the increase in call volume.
In manufacturing the strength of a part is affected by a change in the input materials.
Change point detection can go back in time , go through the historical sensor data and highlight the time stamps where changes occurred. Those time stamps can potentially be linked to a supplier switch, different batch of input materials or a change in the operating environment.
Online Retail Data Set (link)
Daqing Chen, Sai Liang Sain, and Kun Guo, Data mining for the online retail industry: A case study of RFM model-based customer segmentation using data mining, Journal of Database Marketing and Customer Strategy Management, Vol. 19, No. 3, pp. 197â€“208, 2012 (Published online before print: 27 August 2012. doi: 10.1057/dbm.2012.17).
Change Point Detection: a powerful new tool for detecting changes (link)
Change Point Detection with seasonal time series (link)
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.