Using Aster Express: ACT 2, Loading Data

Aster
Teradata Aster is an analytic platform that embeds MapReduce analytic processing with data stores. •Embedded MapReduce analytic processing and unique SQL-MapReduce® framework •Massively parallel data store for multistructured data •Intuitive tools and SQL-MapReduce libraries for rapid analytic development
Teradata Employee

Using Aster Express: ACT 2, Loading Data

In this second part of our series on using Aster Express, we'll continuing using the Aster ACT query tool and introduce a new tool for bulk loading data, Aster's ncluster_loader.  (See Using Aster Express: Act 1 for part 1 of this series).

With the Aster Express images for VMware Player, we've also included some sample data sets.  These data sets will allow us to create analytic tutorials to showcase the power and flexibility of the Aster platform.  These data sets have been zipped and included in a "demo" directory on the Queen node.  As we saw in the first article of this series, we've also included the Aster client tools on the Queen as well.  ACT is the tool that we've been using for submitting SQL queries to our Aster cluster.  Another very useful tool is "ncluster_loader", which is the Aster tool for bulk loading data.

Let's start by logging into our Queen node.  For simplicity with our virtual images, we'll login directly to the Queen instance.  As we showed previously, it is also good practice to use a remote SSH utility.  Both are equivalent for our purposes.  Our user login/password is "aster/aster" for Aster Express.  Once we're on the Queen, we open a terminal window by double-clicking the GNOME Terminal icon, and stretch it a bit to give ourselves some elbow room.  Start the ACT query tool by typing "act" on the command line and password "beehive" when prompted.

Creating a new Table

We are going to create a new table for our first dataset.  This dataset contains just over 1 million records that simulate customer visits to a bank web site.  Here is the SQL CREATE TABLE statement that we'll use:

CREATE TABLE bank_web_clicks
(
customer_id INTEGER,
session_id INTEGER,
page VARCHAR(100),
datestamp TIMESTAMP
) DISTRIBUTE BY HASH (customer_id);

Type or copy/paste this CREATE TABLE statement into ACT.  (See the screenshot below).  This now gives us a landing table for loading our data.  Notice again the 'DISTRIBUTE BY HASH' syntax.  This tell Aster how the data should be distributed across the Worker nodes in the cluster.  We're choosing to distribute based on the customer_id.  We'll cover this in more detail once we get to the analytic tutorials.

Let's exit ACT and find our dataset.  Use the ACT quit command, "\q" to bring us back to the shell command prompt.

Loading Data

The zipped datasets are in a folder name "demo" under the aster home directory.  Let's first unzip the "bank_web_data.zip" file.

cd demo
unzip bank_web_data.zip

We're now ready to use the Aster load tool, ncluster_loader.

For help on the syntax for using this tool, use the "--help" parameter syntax.

ncluster_loader --help

In our example, we'll specifiy all the required parameters to connect to our Aster cluster, plus our load table and the data file.  We'll also tell the loader to skip the first row, as that contains the field names, plus we'll add the 'verbose' flag so that we can see all the processing details.

ncluster_loader --hostname 192.168.100.100 --username beehive --password beehive --dbname beehive --skip-rows 1 --verbose bank_web_clicks bank_web_data.txt 

You'll see the processing details scroll on your screen; connecting to Aster, ready file formats and finally the output, showing just over 1 million records loaded in only a few seconds!

That's it.  We now have some sample data to play with.  Here's an example to show the distinct web pages visited by customers:

select distinct page, count(*) from bank_web_clicks group by page;

Also, if you want to load and play with the the other 2 data sets in the demo directory, here are the CREATE TABLE statements, along with the ncluster_load commands:

CREATE TABLE retail_web_clicks
(
cookie_id INTEGER,
session_id INTEGER,
product_id INTEGER,
page VARCHAR(100),
search_keyword VARCHAR(100),
datestamp TIMESTAMP
) DISTRIBUTE BY HASH (cookie_id);

ncluster_loader --hostname 192.168.100.100 --username beehive --password beehive --dbname beehive --skip-rows 1 --verbose retail_web_clicks retail_web_data.txt
CREATE TABLE ad_web_events
(
user_id INTEGER,
ad_id INTEGER,
product_id INTEGER,
event VARCHAR(100),
datestamp TIMESTAMP
) DISTRIBUTE BY HASH (user_id);

ncluster_loader --hostname 192.168.100.100 --username beehive --password beehive --dbname beehive --skip-rows 1 --verbose ad_web_events ad_web_data.txt

So there you go, with these 3 data sets, you have 25 million rows of sample data to play with.  In our next article we'll show some powerful Aster analytics that you can apply to these tables to yield new insights from your Big Data -Aster Act 3.

16 REPLIES
Teradata Employee

Re: Using Aster Express: ACT 2, Loading Data

Thanks Mike,
there is a typo in the bottom 2 ncluster_loader commands above (extra whitespace before the verbose option).
They should read:
ncluster_loader --hostname 192.168.100.100 --username beehive --password beehive --dbname beehive --skip-rows 1 --verbose retail_web_clicks retail_web_data.txt
and
ncluster_loader --hostname 192.168.100.100 --username beehive --password beehive --dbname beehive --skip-rows 1 --verbose ad_web_events ad_web_data.txt
regards,
Paul

Teradata Employee

Re: Using Aster Express: ACT 2, Loading Data

Thanks Paul....it's now fixed.

Re: Using Aster Express: ACT 2, Loading Data

I have succesfully completed Getting started, ACT1 & ACT2. I am looking forward to ACT3.

Thanks,
Attila
N/A

Re: Using Aster Express: ACT 2, Loading Data

Same here Act 1 and 2 looks good, when will Act 4,5,6 ... getting published. Why don't you publish whole step-by-step user guide?
Enthusiast

Re: Using Aster Express: ACT 2, Loading Data

can we use Asterdata Developer Express with this vm instance?
N/A

Re: Using Aster Express: ACT 2, Loading Data

Thanks! This is usefully.
Teradata Employee

Re: Using Aster Express: ACT 2, Loading Data

hi mike,
i need to connect the aster queen VM into obiee. but due to network issues i am not able to use 192.168.100.100 to connect between host os and guest VM. i have to use dhcp. i am not able to connect to database.192.168.100.100 should harcoded in configuration files of aster database. please let me know the configuration files i need to change to add my dhcp IP instead of current 192.168.100.100.

Thanks,
Deena
Teradata Employee

Re: Using Aster Express: ACT 2, Loading Data

Deena,

Try adding a 2nd virtual network adapter to the Queen using the virtual settings tab. This one could have an IP address on your network, leaving the original one for the private network between Queen and Worker nodes.

Re: Using Aster Express: ACT 2, Loading Data

Hi,

Is it possible to load data using ncluster_loader with the data coming from a input stream or say a pipe?

Storing the large amount of data to a text file and then loading it to aster database is a very expensive step.

Thanks

Ravi Chamarthy