ASTER: Automated Model Management & Version Control

Learn Aster
Teradata Employee

All source code and technical attributes are included below:

What is Automated Model Management and Why is it Important:

Real time and predictive modeling is the hot topic of the day but it has its limitations.  Simply put, predictive models decay or go stale.  This problem is exponentially critical when dealing with real time or near real time models.  What is required is an automated means to manage, test, and replace models actively within a system.   Couple these challenges with reporting, version control, and all the issues in the life cycle management and modeling process.  There is no easy way to monitor, retrain and redeploy the models in Aster; until now!

Aster data scientists collect, prepare, and stage the data specific to the use case. They then apply different machine learning techniques to find a best-of-class model, and continually tweak the parameters of the algorithm to refine the outcomes. Automating and operationalizing this process is difficult.  This blog will describe a solution to automating models in Aster.

The Teradata Aster Solution:

One of our clients (=KPN Netherlands) working alongside of Teradata (Jean-Charles Ravon) has developed an automated model management framework.  Most of the work was developed by two persons from KPN: Maria Vechtomova & Chris Molanus.

The Main Attributes of the Model Management Framework Include:

1. Interface is R-AsterR

2. Model Accuracy/Precision measurement

3. Workflow and Version control

4. Standardized and Unified Approach

5. Reporting and Monitoring

Key Benefits Include:

The Model Development Process in Code:

The Entire Model Factor Framework is Shared on GitHub:

https://github.com/kpn-advanced-analytics/modelFactoryR

The KPN Model factory at a glance:

 

 - The model factory has shrunk the production model development time from 4.5 months to 1.5

 - The emphasis has been put on collaborative working through version control, shared model metrics and shared development environment

 - It provides more flexibility to Data Scientists by allowing them to test any kind of approaches/algorithms

 

Starring:

 - Teradata Aster and R in database integration as the computing engine

 - Teradata Aster and Hadoop as the data platform

 - GitHub as the version controller

 - Jenkins as the orchestrator

 

In depth:

 

Decrease development time:

 - Teradata Aster as the massively parallel R models build and execution platform

 - Teradata Aster as the data platform which enables MPP for data management and KPI designs

 - Model Factory package + Teradata Aster R to simplify the modelisation process

 - reduce I/Os by removing most of the extract/load operation between Aster and R

 

Collaborative working:

 - everything (data, models, performance reports) is stored and historised in the same place which improve the knowledge sharing

 - version control with GitHub allows to manage the models and keep track of them

 - A dedicated R package has been built to facilitate version control (who has done what, when and how)

 

Flexibility

 - you can leverage the humongous R analytical functions library to test the approaches  best suited to solve your problem

 - the model factory package has been designed in that perspective: test many, check the performances, share your insights and prodictionise