Aster Provides a New Tool in Analytic Discovery

Learn Data Science
Teradata Employee

Discovery Explained:  Discovery is process used to ascertain or learn something about an organization, its client base, a process, or operational activity.  Organizations that implement discovery systems are attempting to use data in new ways to implement directional change.  These changes should have material impact on the organization such as cost avoidance, revenue creation, or potentially both.  The discovery outputs should be usable by people within the organization to operationalize and take action.  That means that the people must be willing to make changes but also have the time to operationalize change in direction.  Discovery systems can be vital in understanding and adapting change to customer behaviors over a variety of channels, manufacturing inefficiencies, supply chain management, as well as the creation of new products and services all around collecting and analyzing data products such as mobile phones, exercise sensors, healthcare sensors, and telematics devices.  The diagram below, Data Discovery Process, details the 4 step process of data discovery and the people involved.


Diagram:  Data Discovery Process

The Discovery Process:  Rapid Analytics Development:  We have all heard of RAD or Rapid Application Development and now we are starting hear more and more about Rapid Analytics Development.  Rapid Analytics Development is a process which encompasses tools and people that enable fail fast, change fast and succeed fast analytical outputs on a massive scale of data.  The discovery process starts with an intuition, or idea with respect to new forms of data both structured and unstructured.  For many years we have focused on structured data but have lacked the tools and platforms to add in customer interaction data including: (email, clickstream, chat, productivity documents, and machine logs) The idea is to be able to rapidly construct analytics on a variety of data sources and structures in order to go from transactions to interactions between people, process, and/or equipment.

There are four parts to the Discovery process:  Data Acquisition, Data Preparation, Analysis, and Visualization or Information Delivery.  The next sections will discuss each part of the process and how it could impact how you manage your discovery platform.  As you read these sections keep in mind ‘Rapid Analytics Development.’

Data Acquisition:  Data Acquisition is the process of attaining and loading data into a discovery platform.  It is critical that you are able to absorb massive amounts of data quickly through a variety of channels and software/network capabilities.  Moving data between systems has always been a major area of cost to more traditional discovery solutions and it also represents an opportunity cost to an organization if it is not fast or delays the ability to exploit the data through analytics.  There should be multiple ways to move data between systems including the network or through more traditional file based mechanisms.  Spending less time in data acquisition enables more time for analytics.  More time for analytics means that analytical outputs can have more time to be implemented operationally through changes in the operations of an organization.

Data Preparation:  Data Preparation is the process of transforming the data into an analytical ready state.  Does my platform not only provide the ability to rapidly ingest data but also simple tools that enable me to prepare data for analytics rapidly?  Data preparation should be able to take advantage of the hardware platform where the data is located, shared nothing infrastructures and high speed networks enable this activity more efficiently.  Again, not spending abundant time in data preparation enables Rapid Analytic Development.

Analysis:  How many lines of code does it require for you to get from an idea to an answer?  Does the platform you are using require highly specialized skills or does it require more commodity skills like SQL or SQL like commands.  If you have to write thousands of lines of code to get to an analytic output then you will be spending more time in solution development and require detailed tests to validate the output.  The greater the code surface area the greater the risk your answer could be incorrect. Not only is testing impacted but your ability to change course will also take longer.  It is vital that you select a platform that requires fewer lines of code to get to an answer.

Visualization or Information Delivery:  Once my analytics are complete we must be able to quickly show and demonstrate the results.   Discovery use cases can be in the form of Graphs, Sankeys, and Hierarchy forms.  It is important to be able to not only show data but also be able to show different output styles that show relationships and their strengths that show decision patterns and behavior patterns.  These types of visualizations are atypical in traditional business intelligence systems and offer a very powerful ‘what if’ style of communicating with your data.  We often refer to this as having a conversation with your data.  Many times new intuitions are formed as a result of interacting with the outputs of a discovery platform and thus start a new analytic process.  This process just emboldens the need for a fail fast, change fast infrastructure platform to support discovery.