Qualcomm is using analytical and machine learning techniques enabled by the cloud to identify patterns, anomalies and areas with opportunities for improvement as well as understand 'total network profile'.
Marketers need to visually analyze customer paths. IT professionals should be able to visually analyze server logs. Healthcare professionals want to visually analyze treatment paths.
There is no reason any of these tasks should require advanced coding skills.
Check out these demo videos we recently put together for the Teradata Path Analysis Guided Analytics Interface. You’ll see how easy it is to visually explore paths without writing any code. You can export lists of customers (or servers, or patients) who have completed paths or are on specific paths. And you can investigate text associated with events on these paths. All you need to be able to do is specify a few parameters in the interface and click a few buttons.
In this demo, we use the predictive paths capabilities of the Path Analysis Interface to identify two sets of customers. One set of customers is at risk of churn. The other group is prospects we may be able to push across the line to conversion.
In this video, we look at “cart abandonment” scenarios with an online banking data set and an eCommerce data set. Also, we showcase the “Add Drops” feature that makes it visually apparent where prospects and customers drop off paths within the Path Analysis Interface.
The text analytics capabilities of the Path Analysis Interface are very unique and also very powerful. In this demo, we use text to provide context around complaints within a multi-channel banking data set.
Here, we are looking at healthcare billing data. We want to make it apparent that path analysis use cases are about much more than marketing. Healthcare professionals may also want to look at paths to certain procedures, paths around treatment and recoveries, or paths to specific diagnoses.
If you’re interested in visually exploring paths and patterns, please contact your Teradata account executive or send me a note at firstname.lastname@example.org. We can have you up and running with the Teradata Path Analysis Guided Analytics Interface on Teradata, Aster, or the Teradata Analytics Platform in no time!
XGBoost has gotten a lot of attention recently as the algorithm has been very successful in machine learning competitions. We in Aster engineering have been getting a lot of requests to provide this function to our customers. In AA 7.0, we’ve released an XGBoost/Gradient Boosting function.
The techniques of XGBoost can be used to improve the performance of any classifier. Most often, it’s used with decision trees, which is how we’ve built it in Aster.
Decision trees are a supervised learning technique that tries to develop rules (“decisions”) to predict the outcome associated with an observation. Each rule is a binary choice based on the value of a single predictor: the next binary choice depends on the value of that predictor, and so on, until a prediction can be made. The rules can be easily summarized and visualized as a tree, as shown below.
In this tree, the outcome is 0, 1, 2, 3, or 4, where 0 indicates no heart disease, and 1 through 4 represent increasing severity of heart disease. The first “rule” is based on the value of the “Thal” column. If it is anything other than 6 or 7, the predicted outcome is 0. If the value in the Thal column is 6 or 7, the next step is to look at the value in the STDep column. If it is less than 0.7, the next step is to look at the value in the Ca column; if it is greater than or equal to 0.7, the next step depends on the value in the ChestPain column. To make a prediction for an observation, follow the rules down the tree until you reach a leaf node. The number at the leaf node is the predicted result for that observation.
A couple of techniques that can significantly improve the performance of decision trees are bagging and boosting. Bagging stands for “bootstrap aggregation”. Bootstrapping is a statistical technique where multiple datasets are created from a single dataset by taking repeated random samples, with replacement, from the original dataset. In this way you create a large number of slightly different datasets. Bagging starts by bootstrapping a large number of datasets and creating a decision tree for each one. Then, combine the trees by either majority vote (for classification problems) or averaging (for regression problems).
Random forest is a very popular variant of bagging. With random forests, you use bootstrapping to create new datasets as you do with bagging, but at each split, you only consider a subset of the predictors. This forces the algorithm to consider a wider range of predictors, creating a more diverse set of trees and a more robust model.
Boosting is a different approach. With boosting, you build trees sequentially. Each tree focuses specifically on the errors made by the previous tree. The idea is to gradually build a better model by improving the performance of the model at each step. This is different from bagging and random forest because at each stage you try to improve the model, by specifically looking at the points that the previous model didn’t predict correctly, instead of just creating a bunch of models and averaging them all together.
There are several approaches to boosting. XGBoost is based on gradient boosting.
The gradient boosting process starts by creating a decision tree to fit the data. Then, you use this tree to make a prediction for each observation and calculate the error for each prediction. Even though you’re predicting the same data that you used to build the tree, the tree is not a perfect model, so there will be some error. In the next iteration, this set of prediction errors becomes the new dataset. That is, each data point in the data set is replaced by the delta between the actual result and the predicted result. At each iteration, you replace the dataset with the errors made by the previous iteration. Then, you build a tree that tries to fit this new dataset of the deltas, make new predictions, and so on. When you add these trees together, the result should be closer to the original actual value that you were trying to fit, because you’re adding a model of the error. This process is repeated for a specified number of iterations.
Gradient boosting and XGBoost use a number of other optimizations to further improve performance.
Regularization is a common technique in machine learning. It refers to penalizing the number or the magnitude of the model parameters. It’s a way to prevent overfitting, or building a model that fits the training data so closely that it becomes unflexible and doesn’t perform well on different data.
When working with decision trees, regularization can be used to control the complexity of the tree, either by reducing the number of leaf nodes or the values assigned to each leaf node.
Typically in gradient boosting, when you add the trees together, each tree is multiplied, by a number less than 1 to slow the learning process down (boosting is often described as a way to “learn slowly”). The idea is that moving gradually toward an optimal solution is better than taking large steps which might lead you to overshoot the optimal result.
Subsampling is also a common technique in machine learning. It refers to building trees using only a subset of the rows or columns. The idea is to force the process to consider a more diverse set of observations (rows) or predictors (columns), so that it builds a more robust model.
The Aster XGBoost function also boosts trees in parallel. This is a form of row subsampling, where each vworker gets assigned a subset of the rows, and creates a set of boosted trees based on that data.
Stopping criteria are another important factor when building decision trees. In the Aster XGBoost function, you specify the exact number of boosting steps. The function also has stopping criteria that control the size of each tree; these arguments are analogous to those used in the other Aster decision tree functions Single_Tree_Drive, Forest_Drive, and AdaBoost_Drive.
Here’s the syntax of XGBoost_Drive. Refer to the Aster Analytics Foundation User Guide (Release 7.00.02, September 2017) for more information about the function arguments.
Here’s an example. The dataset is available from the UCI Machine Learning Repository. It’s a set of fetal monitoring observations classified into 3 categories. There are 2126 observations and 21 numeric attributes. The first few rows are shown below.
As usual when training a model, we divide the dataset into training and test sets, and use the training set to build the model. Here’s a sample function call:
The function displays a message when it finishes:
We can use the XGBoost_Predict function to try out the model on the test dataset:
Here are the first few rows of the output:
select id, nsp, prediction from ctg_predict;
To conclude, we’re very excited to make this algorithm available to our customers. Try it out!
The blogosphere is full of sound bites, anecdotes, and cliches on Machine Learning/AI and it's important to discern its place in the larger picture of Advanced Analytics/Data Science. Especially for those in business & IT who are estranged from the 'religious experience' of using the different tools available.
Machine Learning and AI methods like Deep Learning fall into a larger 'analytic' library of things you can do with your data. Also, the access methods such as SQL and languages like Python, R etc., The best analogy is a bar or kitchen stocked with the most exotic stuff. As a bartender or chef, you get to make the best cocktails or entree, drawing from an assemblage of options. You may have your own favorite, but it's also important to fine tune stuff to an actual need that someone might want. Here's the thing - you may have the most expensive single malt or ingredient in your bar/kitchen and it doesn't mean everyone will want it! So variety is the key to delivering precisely what the business/end user wants!
Some Examples of Advanced Analytics Choices:
I need to find the top 10 key phrases in product or show reviews that tell me characters and powerful emotions. There are several options:
Start NGRAMing with 1,2,3 terms at a time. Run weighting on keywords with TF/IDF. Show the top 10 - 1,2,3 NGRAMs with the highest IDF value. We can do this with some clever SQL :)
Run a CRF model trained on a big corpus from a POS (parts of speech) output, weight it and sort it. You get the benefit of interesting verb and noun phrases (intelligent)
Run a Word2Vec (Deep Learning/GPU) pass on the data and try to construct a neural embedding model to discover the phrases.
<Add your own recipe/cocktail>
I want to group all my call center text into clusters, so I can see what they are talking about:
Do a term weighting and some distance metric like Euclidian or Cosine and run Graph Modularity to chop clusters and run Phrase detection on each of those. Use a percentile technique to decide # of significant clusters.
Run an LSI (Latent Semantic Indexing) dimension reduction and run K-Means. Decide # of clusters after finding the "elbow of significance"
Run an LDA model and specify how many clusters. Find the # of topic clusters iteratively until it makes sense.
<add your own ingredients/mixology>
With some cleverness, some iterative and grouping techniques above avoids mainstream Machine Learning completely and gets to 80% of the answer with simple sophistication. Of course, using advanced ML techniques will increasingly get us to 95% of the answer - especially when it comes to Fraud and other mission-critical "fine-grained" use cases. Let's put that aside for a moment.
For a lot of simple use cases or basic hypothesis testing, 80% answer may directionally "good enough" stuff for business and that's ok! The key is to have many options all the way from simple to complex with well-understood tradeoffs, such as performance, tuning complexity etc.,
How to get everything in one place? See Sri's blog:
Come up with creative solutions that use best of the breed. Use SQL for what it's good at and run R in-database to do scoring or modeling using say ARIMA.
Or run LSTM deep learning for time series forecast on Tensorflow while using SQL to curate and organize the data in the same transaction.
Run large-scale PCA using SQL on Aster and do a logistic regression in a Spark cluster packaged in the platform. Put the results on a table for Tableau users to see the significant variables in a dashboard.
Run Aster's XGBoost on a churn analytic data set to create a model. Score users on the propensity to churn etc., Run Aster's Hidden Markov (Graph implementation) on the data set to find the latent transition matrix and emission probabilities.
Hope you enjoyed the blog post on the postmodern definition and sample usage of Advanced Analytics ;). More to come in my future blog posts on the science and art of using advanced analytics -> for business problems.
Using a popular analytic technique to understand behaviors and patterns, data scientists reveal a subtle but critical network of influence and competition, giving this gaming company the ability to attract and retain gamers in this $109B industry.
Insight that can only be found when you combine multiple sources of data with analytics. With Teradata Aster® Analytics, users apply cFilter. A function tailor-made for understanding behaviors and opinions.
Looking into the data, amazing patterns emerge.
Understanding the relationships that drive user behavior can help developers create better games to attract users, prevent churn, and determine how gamers influence each other.
Do you know which customers are likely to churn? Which prospects are likely to convert?
Historical path analysis is a critical factor in such predictions. The problem is path analysis is hard. And even when companies have such capabilities, they often reside in the hands of a few specialists – or vendor consultants.
The business analysts, marketers and customer support professionals who could ultimately act on these predictive insights to improve customers’ and prospects’ journeys are effectively left out in the cold. Even the specialists are ultimately confined to the limits of their tools.
Ask anyone who has used a traditional business intelligence tool to understand customer paths. It requires significant time and patience to shoehorn this type of analysis into a tool that was not designed for it. To begin with, just manipulating the data to build an event table for a BI tool is a significantly high hurdle. And even at the end of such a project, organizations end up with a static, inflexible report on historical data that does little to help businesses prevent future churn or accelerate future conversions. (This is hardly a criticism of BI tools, as their benefits and value are well documented. I’m only pointing out that path analysis historically is not one of their strong suits.)
Other advanced approaches leverage statistical tools like R and programming languages like Python. They may incorporate sophisticated analysis techniques like Naïve Bayes text classification and Support Vector Machine (SVM) modeling. But, at the end of the day, these are not tools or techniques for businesspeople.
And at the end of the day, what matters is providing your business teams the opportunity to influence the customer experience in a manner that is positive for your business.
The solution is to bring path analysis – including predictive path analysis – to the business. For such a solution to succeed, it must be:
Visual. For marketers and business professionals, the ability to visually explore analytics results is critical. Tree diagrams are instantly understandable, as opposed to results tables that require the user to read through thousands of rows.
Intuitive. Most analysts and marketers are comfortable using business intelligence tools to understand their data. We use point-and-click interfaces to interact with information every day. But marketers are not comfortable directly manipulating data with SQL or applying advanced statistical models to that data for predictive results. Even predictive results must be returned with a few clicks.
Code-free. Your marketers are expert marketers. They shouldn’t need to be expert programmers to understand which customers are on negative paths and which prospects they can help push over the edge to convert.
Using the interface, marketers and analysts use a simple form to specify an event of interest – a churn event or conversion event, for example – and whether they want to see paths to or from that event. The interface returns results in the forms of several visualizations, including tree, sigma, Sankey and sunburst diagrams, as well as a traditional bar chart.
Within the tree diagram, users can select partial paths to their event of interest and create a list of users who have completed that partial path but not yet completed the final event. For example, if you are looking at an online banking data set and see that a path of “fee complaint, to fee reversal, to funds transfer” precedes a large number of churn events, in three clicks you can generate a list of customers who have completed the path “fee complaint, to fee reversal, to funds transfer” but not yet churned. Thus, you have just used Predictive Paths to identify potential churners without writing a line of code.
Video Link : 1105 This video demo shows how marketers and business analysts can predict next steps for customers with the Path Analysis Guided Analytics Interface.
Watch this short video to see how Predictive Paths works within the Path Analysis interface. If you’re interested in bringing these capabilities to your business teams, please contact your Teradata account executive today.
Last week I spent at Anaheim Convention Center helping out with Aster demo on the Expo floor, participating in Advanced Analytics session and co-presenting business session on Sunday and presenting my data science session on Tuesday. Below I assembled all resources available online about these events.
Customer Segmentation Based on Mobile Phone Usage Data Use Case (demo and github)
Churn Use Case Using Survival Analysis and Cox PH (demo and github)
Sunday Afternoon Session
John Carlile prepared excellent session on text analytics use cases I helped him with on one of our POCs - Rumpelstiltskin Analytics - turning text documents into insight gold with me as a co-presenter. Please contact John at email@example.com for more details (pdf attached). I would add the session covered analysis of user reviews of major hotel operator across many chains, "fake" review detection and unsupervised and supervised techniques such as LDA and logistic regression.
My presentation Building Big Data Analytic Pipelines with Teradata Aster and R (morning block) contained two parts:
overview: a value proposition of data science pipelines, their enterprise application focus, main principles and components, and requirements;
a primer on building blocks and core technology with examples on Aster R platform: covering topics on environment, grammar of data manipulation, joins, exploratory analysis, PCA and predictive models (pdf attached, R presentation slides and github).
General session catalog for PARTNERS is available here.
Harnessing an analytical technique known as text clustering, companies in multiple industries can analyze customer call center data to find key word trends and phrases that may quickly alert them to potential customer service problems, manufacturing defects or negative sentiment.
Safety Cloud – a transformation of multiple types of text data through analytics. A visualization leading to significant innovation. Applying natural language processing to these analytical techniques allows for sentiment analysis. Giving businesses an insight without looking at every document the dots represent.
The explosion of interest in Artificial Intelligence (AI) is triggering widespread curiosity about its importance as a driver of business value. Likewise, Deep Learning, a subset of AI, is bringing new possibilities to light. Can these technologies significantly reduce costs and drive revenue? How can enterprises use AI to enhance customer experiences, create more insight across the supply chain, and refine predictive analytics?
PARTNERS 2017 offers the curious visionary and the creative executive plenty of education on the pragmatic business value of AI. Here are a few of the sessions on the topic:
Autonomous Decision-Making, ML, & AI: What It Means & Why Should You Care
We’ve entered a new era of analytics with machine learning and artificial intelligence algorithms beginning to deliver on the long-promised advancement into self-learning systems. Their appetite for vast amounts of data and the ability to derive intelligence from diverse, noisy data allows us go far beyond the previous capabilities of what used to be called advanced analytics. To succeed, we need to understand both capabilities and limitations – and develop new skills to harness the power of deep learning to create enterprise value. This session focuses on the future of AI, emerging capabilities today; and relevant techniques; the ‘Think Deep’ framework for automating the generation and deployment of models for machine learning & deep learning. Wednesday, October 25, 2:00: PM-3:00 PM
Fighting Financial Fraud at Danske Bank with Artificial Intelligence
Fraud in banking is an arms race with criminals using machine learning to improve their attack effectiveness. Danske Bank is fighting back with deep learning – and innovating with AI – to curb fraud in banking spanning topics such as model effectiveness, real-time integration, Tensor Flow vs Boosted Decision Trees predictive models, operational considerations in training and deploying models, and lessons learned. Monday, October 23, 11:30: AM-12:15 PM.
Artificial Intelligence: What’s Possible For Enterprises Today
The sci-fi notion of AI is still a long way off – that’s pure AI. However, Pragmatic AI technology is here today and enterprises are using AI building block technologies such as machine learning to achieve amazing business results. In this session, Forrester Research VP & Principal Analyst, Mike Gualtieri, will demystify AI and explain what enterprises use cases are possible today and how to get started. Tuesday, October 24, 9:00: AM-9:45 AM. Presenter: Mike Gualtieri, Principal Analyst, Forrester Research.
Artificial Intelligence and the Teradata Unified Data Architecture (UDA)
Artificial Intelligence has entered a renaissance. Underlying this progress is Deep Learning – driven by significant improvements in Graphic Processing Units and computational models inspired by the human brain that excel at capturing structures hidden in massive datasets. Learn how AI is impacting enterprise analytics today in applications like fraud detection, mobile personalization or predicting failures for IoT. Focus on ways to leverage and extend the Teradata Unified Data Architecture today – and a new AI reference architecture – to produce business benefits. Monday, October 23, 2:00: PM-3:00 PM.
Employing Deep Neural Nets for Recognition of Handwritten Check Payee Text
The handwritten check is a primary linchpin of the customer relationship at Wells Fargo. It represents an enormous personnel cost when the bank attempts to resolve the payee field and other transaction information in handwritten form. Currently, Automatic Teller Machines (ATM) operated by Wells Fargo can recognize monetary amounts (numerical digits) in cheques utilizing neural networks trained on a standard handwritten numeral dataset. This session details the latest in image recognition and deep learning techniques to extend recognition capability to the payee field and a new capability to deploy deep neural networks with Aster and Tensorflow, in a SQL interface. Tuesday, October 24, 11:30: AM-12:15 PM. Presenters: Gary Class, Wells Fargo, and Kyle Grove, Senior Data Scientist, Teradata.
Dig in Deep into a Data Fabric Implementation Using Teradata and SAS
Banco Itau-Unibanco S.A. is one of the largest banks in Latin America and a global Top 50 bank by market cap. It operates in the retail, wholesale, private and investment banking, private equity, asset management, insurance and credit card business. The session will outline a new data fabric platform based on Teradata and SAS integration – which brought new capabilities to the credit risk analysts, in terms of amount and complex data to be used in their models. With this platform the risk teams are able to manipulate, in a dynamic and productive way, different sources of data, higher volume (about 30 times more) and new algorithms (e.g. Neural Networks) to improve models performance. The results are amazing and will be shared in detail. Wednesday, October 25, 10:30: AM-11:15 AM. Presenters: Dalmer Sella, Data Engineer, Itau and Fabiano Yasuda, Credit Modeling Manager, Itaú-Unibanco S.A.
Please be sure to check out the Session Catalog for more, and try to register early to join the “Meet-Up” sessions!