Automated failure predictions originate in streaming (real-time) analytics models that detect anomalous patterns of sensor values. Crafting an anomaly-detection model begins with determining which sensor values behave in unusual ways, before a given type of failure occurs. Often, a sensor’s values must be transformed for a predictive pattern to emerge in the simplest possible form. Thus, building an anomaly-detection model requires that one answer three questions:

**Variable selection:**What combination of sensor values best predicts a given failure type?**Transformation selection:**What transformations should be applied to those sensor values to make predictive patterns as obvious and simple as possible?**Pattern selection:**Which patterns in the (possibly transformed) sensor values predict the given failure type?

Answering these questions effectively in IoT predictive-maintenance contexts is especially challenging, because a single machine can have many sensors, making the search for the right combination of variables, transformations, and patterns very time consuming.

## Example: Predicting bearing failure

To illustrate the decision-making process VAP supports, suppose a sensor measures the operating temperature of a critical bearing in a given class of industrial machine. When the bearing is healthy (operating normally), the temperature fluctuates according to a fixed statistical distribution, having a certain mean and standard deviation. When the bearing becomes unhealthy (begins to fail), the temperature increases, fluctuating around a gradually increasing mean, perhaps with ever larger fluctuations. An effective decision procedure for modeling this failure type would need to accomplish two things: 1) determine that certain transformations of the bearing-temperature variable make the increasing mean and variation salient, and 2) support a formal characterization of these changes. The formal characterization would let the model accurately predict the failure type (bearing failure) and timing.

While the type of distribution of healthy bearing temperature would typically be the same across machines, the healthy mean and standard deviation might vary significantly across machines. The critical pattern predicting bearing failure might be a persistent increase of operating temperature that is some fraction of a healthy standard deviation above a specific machine’s healthy mean, regardless of the machine’s particular values for these parameters. The natural transformations would then be centering (subtracting the healthy mean from each raw value) and scaling (dividing the centered values by the healthy standard deviation). Combining these transformations is a very common transformation termed *standardizing*. Inspecting standardized bearing temperatures may make it much easier to see a fixed relationship between persistent temperature increase (and fluctuation) and bearing failure, by giving bearing temperature a unit less representation that one can compare across machines. The VAP demo will include specific examples of these transformations.

## Shortening model-development cycle times

Many variable transformations can be useful in building IoT PdM models. VAP encapsulates about a dozen frequently used transformations. It lets its end user inspect the graphs of these transformations before a specific failure, or inspect the graphs of a single transformation before all failures of a given type. This can shorten the amount of time a data scientist needs to select a good set of predictors for a given failure type, and to identify the most predictive patterns in these predictors. The AACOE is porting VAP to run on several database platforms, and is enhancing VAP to support end-user extensions to the set of transformations VAP supports.

Of course, there are more steps involved in building a PdM model. Once you have chosen variables, transformations, and patterns, you must reduce the patterns to a formal representation that a real-time pattern-recognition algorithm can use to predict a given failure mode and timing. There are many ways to do this. VAP does not yet aid its end user in pattern representation or algorithm selection, but the AACOE hopes soon to enhance VAP to do so.

Anomaly-detection modeling remains one of the most important skills for a data scientist. Tools like VAP can automate some of the data engineering grunt work of PdM modeling, freeing the data scientist to focus on the more arcane mathematical nuances of early and accurate prediction, and shortening the cycle time required to develop and deliver economically valuable PdM models.

Cheryl Wiebe of Teradata hosted a free live webcast "Detecting Anomalies in IoT with Time-Series Analysis" covering challenges in anomaly detection, statistical and machine learning algorithms applied in time-series data, event-based versus pattern-based anomaly detection, and tools to tackle anomaly detection. Hear the full recorded replay.

*This post is a collaboration between O'Reilly and Teradata. *