Most data scientists are looking for tools, data sources, formats to build good predictive models for their customers/products etc., Propensity to Buy, Churn, Upsell, Abandon etc., A good predictive model may account for seasonality, macroeconomic conditions, product campaigns, competition, pricing, customer life cycle, recent complaints etc.,
Imagine finding a perfect invariant model to predict the stock, customer churn, adoption consistently and reliably with the same high accuracy every single time (sans black swans)! Who wouldn't like that ? :) There are of course those holy grail models in a lot of areas which has been improvised over years by observation and testing.
Weather prediction by far is the best example that everyone knows about. It may be off a lot of times, but it is getting more and more reliable each year isn't it ?
What does it to take to build a perfect analytical model & deploy it ?
Usually, it's a combination of few things:
Omni-Channel Data Sources (structured/semi-structured/unstructured) - online transactions, call center notes, payment flows, social media, web logs, user profile, weather, traffic, competitive offers, campaigns, ad serving data
Client Tools - SQL, Procedural interface like R or Python, Visualization, Insight tools
Algorithm Suite - Data Cleansing tools, Machine Learning (Deep Learning included), Dimension Reduction, Time series forecasting, Path & Pattern, Sequence prediction, Ensemble, Text and Graph Algorithms, Cross Validation, Grid Search etc.,
Scalable Engines - SQL, NO-SQL, Graph, Map/Reduce, in-Memory, Distributed/Parallel, in database R
Real-time streaming interface and ability to export a model and publish it for scoring.
The ever creative application of data science minds of course
Why do most data scientists struggle with analytical model building for business problems ?
Most of the time, it's lack of seamless access to all of the above in the previous section. Ability to combine stuff together, profile & cleanse the data quickly, connectivity to tools, access to scalable engines and a standardized way to access is what a data scientist desires. The architecture should allow the data scientists to lean in and learn a slightly different paradigm w/o starting all over! Anything that causes frustration quickly multiplies the woes that can make model building a hard task.
The speed of iteration and fast fail is the key including the ability to test and validate the model at scale before pushing it to the edge.
What is Multi-Genre (TM) Advanced Analytics ?
All models are wrong. Some are useful - George Box
It's the ability to build analytical models with help of multiple engines seamlessly connected together. Teradata Aster is an example of a multi-genre analytics engine that uses the best of breed algorithms built over Map/Reduce, Graph, SQL and R engines. Algorithms could be for pathing, text, graph, regression, Machine Learning classification, clustering etc., but as an end user, you can do large scale analytical modeling with SQL or R front end. You can also load and work on different types of data like structured, unstructured and semi-structured without writing any procedural code - including pulling data from multiple sources.
Why is Multi-Genre (TM) Advanced Analytics powerful ?
As a data scientist works on the day to day problems, a lot of them use certain styles that they are comfortable with.
Language Centric Approach: I'm a Data Scientist comfortable with R or Python and everything else is secondary. It's my sweet spot.
Platform Centric: I like open source, so I'm sticking with Spark. Everything else would fall in place.
Data Centric: I need data in a certain format before I do anything. Hyper focus on getting the data together/profiling etc.,
Algorithm Centric: Everything finally works with logistic regression or random forest. I can get most of the stuff done with my favorite playbook. Forget Deep Learning :)
If you want to build perfect models that stand the test of time and not a one trick pony, it's important to consider data from as many sources and rich analytics as possible. Imagine building a model that uses Text Analytics from call center notes, Graph Analytics to explore payment connections, Path Analytics to see the customer sojourn and run ARIMA for a forecast model combine them seamlessly. The platform decides which engine to run (Map/Reduce, Graph or R) ! Also, do hyper-parameter tuning and cross-validation to increase accuracy ... - Sampling is optional!
We have not only provided enough context for the model but also took care of as many customer touch points and behaviors as possible. The point is you don't need to be in Kaggle leaderboard to build models using exotic algorithms. An average data scientist or wannabe can do cool stuff with Multi-Genre.