Why Aster Flips the 80/20 Rule

Learn Data Science
Teradata Employee

In my very long career as a software developer and very large database developer I have experienced the joys and frustrations of many different tools.  I have developed solutions using C, C++, C#, Java, and VB.  I go so far back to the VBX days of VB3.0 on Windows 3.1 16bit world.  I have also built solutions on Sybase, SQL Server, MySQL, DB2, Netezza, and many others.  Working with these tools was a lot of fun but I had to get very creative in order to develop solutions.  I had to work around the limitations of memory, disk, and network.  I also had to work around the limitations of low level languages that didn't provide classes or objects out of the box.  We also didn't have the internet that contains vast libraries of code to cover just about any algorithm.  I then was offered a position with Aster as a Solution Architect.  I was amazed at how quickly you could get solutions built across a variety of solution demands.  I was able to focus on solutions and not the plumbing; this is what I call focus on the 'what' and not the 'how.'  In this article we will discuss how we flip the 80/20 rule.  How Aster enables you to quickly ingest, transform/prepare, build a solution, and finally deliver the solution to the end user.  Being able to do this provides benefits such as:  lower cost of solution development, speed to market, and the ability to focus on business processes and the people acting in them to implement a solution.  The rest of this document will discuss these benefits.

Data Load:

Aster is a massively parallel processing system, or MPP.  It allows me to load data quickly through the network in parallel fashion.  In Aster I have loaded terabytes of data in hours and was working on the analytics the same day.   I was able to seamlessly connect to any database that had a JDBC driver and move that data rapidly.  If we had a Teradata EDW I was able to move that data even faster if the network configuration provided that ability.  Moving data does nothing for an organization however it is necessary.  Todays solution demands require data from many different touchpoints or subject areas.  Moving data is necessary to perform Omni-channel type analytics.  Reducing this time reduces the cost and has allowed me to get started on what really matters applying analytics across multiple genres.

Data Preparation:

Aster is based on an ANSI SQL platform thus allowing me all the power and flexibility of the most common business language around at the moment.  Using SQL allows me to quickly transform and change data into the format that I need it in to perform analytics.   I also have Aster SQL-MR functions that are specifically made for transforming data.  These include: an XMLparser, JSONPasers, Pivot, text tokenization, stemming, and many more.  These functions are easy to use and most ANSI SQL developers are going to be right at home with getting use to them as they are based on the ANSI SQL statement language.   We will discuss this syntax in the next section, however it is very important to understand that 80% of your time is spent in Data Load and Preparation.  Aster can significantly reduce this time so that you can focus on analytic discovery.  You can also add new data sources and change data preparation quickly too.

Apply Analytics:

The Aster SQL-MR syntax statement below will perform a simple Sentiment Analysis against text data.   It is very simple to implement and works against very large sets of data.  I performed a sentiment analysis with Aster just a week ago against 7.2 million call center notes for a customer and it was done in less than 4 seconds.  I joked around that my mouse button took longer to pop up than it took to run the analysis.  Lets take a look at the SQL-MR syntax below works understanding that there are over 120 of these functions available to you and they all work the same way.  These analytics go across many genres of analytics including:  machine learning, text, pathing, statistics, graph, and others.  Lets dig in:

SELECT * FROM ExtractSentiment
(
ON kindleView
text_column('content')
model('dictionary')
level('document')
);

SELECT * FROM ExtractSentiment:  this is the basic form of all ANSI SQL.  ExtractSentiment is not a table or a view.  It is one of the 120 analytic functions provided by Aster.  All 120 functions work this way:  if I wanted to do a nGram I would simply type:  SELECT * FROM nGRAM.  Yes it is that easy.  The rest of the statement tell ExtractSentiment how to work and against what data source.

ON kindleView:  This is the data source either in Aster or connected via an ANSI SQL view.  This is the data source that contains the data that we will do a Sentiment analysis against.  The rest of the statement contains predicates that tell ExtractSentiment how to work.

text_column('content'):  This predicate tell extractSentiment to use the CONTENT column that is contained in the kindView table in Aster.  What Aster is going to do is simply predict the positive, negative, or neutral sentiment of the content of these kindView product reviews.  Yes it is that easy.  There are other results that are brought back such as the strength in the confidence of that sentiment as well as the words that derived that sentiment.

model('dictionary'):  this tells Aster to use its internal dictionary to actually perform the sentiment analysis.  You can build your own dictionary or you can actually teach the Aster sentiment analysis engine how to work.  We call this training a model.

level('document'):  this predicate tells Aster to evaluate the entire document for sentiment and derive one score.  We could also use sentence and evaluated each individual sentence of the document for sentiment.

That is it.  Click run and in a few seconds you will have performed a sentiment analysis against potentially millions of reviews.

That was pretty easy to understand, now imagine what you can do with a Naïve Bayes, Random Forests, or Support Vector Machines.  Yes, all available in Aster.  These analytics would take thousands to hundreds of thousands of lines code to create and implement.  Because I can easily pivot my analytic to my data source producing a predictive analytic with Aster is easy and reduces your time to market by a factor of at least 10x.

Data Presentation:

With the invention of Teradata AppCenter I am now able to produce Apps that anyone can use.  I can easily wrap that sentiment analysis statement into a web based front end and produce output.  I can make that ExtractSentiment statement available for anyone to use and see results.  I don't have to know how to write any SQL or Aster SQL-MR.  I can also produce the following output visualizations:  tables, Sankey, Sigma, Hierarchical, or any basic chart easily with Teradata AppCenter.  I can then open that analytic and output to the business.

Never tell a technologist they cannot do something.  Creativity and sheer will to get things done will always prevail if you have the right resources, but at what cost.  However, with Aster I am simply able to get things done quicker.  So many times I have been in bakeoffs and competitions with other technologies and teams of people.  I have been put into competition with teams of up to 5 persons on Hadoop.  I not only walked weeks ahead of them but I also produced more analytic outputs.  It wasn't because I was better than those people, it was because I had a better set of tools.  I had Aster and they didn't.