Flip the 80/20 Rule for Analytics Webinar Q&A

Learn Data Science
Teradata Employee

The following document captures the questions and answers from the webinar: Flipping the 80/20 Rule of Analytics, presented by John Thuma.

You can watch webinar recording HERE

For additional questions, please contact Arlene Zaima (Arlene.Zaima@Teradata.com).


The lines of code you had mentioned, that need to be written and developed in   Aster too, but it is being done by the Aster development team instead of individual companies?Yes, absolutely correct.  Aster's SQL, SQL-MapReduce and Graph   engines provide lower level API that provide the data scientist and engineers   to code their custom algorithms, however Aster engineers created over a 100 pre-built algorithms that are available through simple SQL calls.
Is it taking care of unstructured data as well?In this use case, the data was   structured sourced from the Teradata data warehouse, however we have many use   cases where unstructured data in integrated, transformed and incorporated   into the analytic process.
Does the new concept of data lakes make any difference to capabilities you show here?We consider the data lake as   another data source as the data warehouses, twitter, etc.  Aster has connectors into data lakes such   as Hortonworks and Cloudera along with other 3rd party databases, where data   can be integrated into Aster for deep analytics.  The beauty of Aster is the ability to build   powerful analytics with a little SQL knowledge.  You don't have to deal with data   partitioning, parallelism, and building the algorithms.  And as John pointed out, it's competely   reuseable!
Does   the concept of data lakes provide and benefits to the capabilitiesyou have   shown here?Yes, you can build the same   analytics wiht a data lake, but the effort and skills sets required are very   different.    Data lakes require   MapReduce, deep machine learning and Java expertise.   As you can see from John's slides 16 - 18…   the coding footprint is dramatically reduced with Aster.  Aster also provides an easy way to promote   your analytics into production via AppCenter.
Do you   have a limit on the data size?Aster is based on an MPP   architecture so we do not have a data size limitation.  Aster analytics are designed with   scalability at it's core to maximize the parallel processing across the MPP   architecture.  To maximize efficiency,   we've created a SQL, SQL MapReduce and Graph (Bulk Synchronous Parallel   processing) engine.  Also Aster   Appliance includes Hadoop nodes to create a data lake for lower cost bulk   storage of data.  So you have the best   of both worlds: Lower cost bulk storage with a powerful analytics   engine.
How is   it better than other platforms like SAS?On the surface, it may appear   that Aster and SAS provide similar capabilities, however as we drill into   it's capabilities, they are very different.    We consider SAS as complementary technology that can be used with Aster.  SAS is a general purpose analytic tool that   provides an business interface for BI reporting and traditional analytics   including data mining technology on data that    fits in memory of your system.    Aster on the other hand focuses on big data analytics and   discovery.  We provide a powerful   platform that leverages MPP processing and 3 powerful analytic engines that   combines path, pattern, graph, text, sentiment, and machine learning methods   within a single platform.  Aster can   scale SAS models through our partnership and the SAS Scoring Accelerator.
Chinese Text Segmentation? So the text analysis can analyze other languages?Yes, Aster text analytics can be   extended to support any language.  Our   data scientist in China extended Aster to provide Chinese Text Segmentation   using the APIs in the SQL-MapReduce engine.
DO you have a trial version of this? We want to experience it.Yes, there is a VM image called   Aster Express available on the Aster Community. Please visit and join @   aster-community.teradata.com
I did not hear how Aster helped cleanse data - i.e. get rid of duplicates, bad   data, missing data, etc. which I understood was the real time consumming part of data preparation. Did I miss something? Seemed like most of your   discussion was on integrationbThere are built in analytics   using SQL-MR functions for identity matching, etc.  We can also leverage machine learning   methods which we have done successfully with predicting ICD9 codes.   This would satisfy missing data.  Duplicates we can get rid of using ANSI SQL   very easily.
Can it take data from social media websites like facebook, twitter, blogs, instagram   etc?Data is data so regardless of its source we can work with it.  Now if   you mean can we take it directly from these sources, the answer is yes but it   will take some preprocessing through customer sql-mr.  I would advise buying an API to the data instead.
Are there any skillsets (programming/datascience) which are required as prerequisite before we learn AsterAster lowers the barriers to   analytics by providing SQL calls to access complex analytics.  Skills sets required to use Aster is SQL   programming, exposure to analytic techniques    and curiosity.  Of course to use the apps deployed in AppCenter is even easier... just need to be able to navigate through a web interface and understand your business.
In big organizations, the business has restricted ability to create tables. Can you   work just as fast in a locked down environment like this?Yes, if you consider the Aster   discovery platform as the business users data lab or sandbox, data can be   accessed from the system of record, into Aster for analysis. The analyst will need to be able to create   and write into tables within the Aster discovery platform.