I am new to Aster and would like to know how to load fixed width files into Aster db. Do we have any utilities in Teradata Aster similar to Teradata like TPT, MLOAD, etc and if so please share any example.
I have loaded delimited files using ncluster_loader. Please share some sample script to process fixed width files.
Appreciate your help on this !!
afaik there's no Aster tool to load fixed width files.
You need to use any ETL tool supporting Aster or JDBC/ODBC.
Thank you Dieter !!
From your response I understand that ncluster_loader is built to process only delimited files.
Just curious to know on how we will process the unstructured data into Aster.
I assume using SQL-MR function we will be able to load Unstructured data and if so can we use it for loading fixed width files. Please correct me if I am wrong.
With flat files it's probably easier just to write a quick shell script to insert delimeters at the approriate places. Use the Unix/Linux "named pipes" (FIFOs) feature load from your script instead of creating a new delimited file. The general idea is:
ncluster_loader -h queenname -d dbname -U username -w secretpw -c myschema.mytable < mynamedpipe &
add_delimiters.sh myinputfile.txt "1-5,6-10,11-20,21-50,51-51,52-52,53-53,55-90" > mynamedpipe
Where "add_delimiters.sh" is the shell script you wrote to convert the fixed-width file to CSV format (remember to convert " to "" within fields!)
In reply to your follow-up question about unstructured data, there are two approaches: load the data as-is in a text blob and parse it later with SQL-MR functions, or pre-parse it via shell scripts or some high-level language (e.g. Java) and load it with ncluster_loader. The difference between the two is that the pre-parsing is going to be faster at first, but won't be as flexible.
You might also consider the Hadoop connector and/or AFS (think of it as "HFS-on-Aster") to store your unstructured data and load it (or just use it) from there.
From the Aster user guide documentation, it is clearly mentioned, "Fixed length fields are not currently supported by ncluster_loader". I was facing the same issue and followed the following approach:
Load the entire dataset as a single column using CTAS (create table as select ...)
CREATE TABLE schema_name.table_name (
all_rows VARCHAR NOT NULL)
DISTRIBUTE BY HASH(all_rows);
Then parse individual rows into respective columns using substring function:
create table schema_name.table_name_new DISTRIBUTE BY HASH(pk_id) as
select substring(all_rows from 1 for 16) as pk_id,
substring(all_rows from 17 for 12) as col1,
substring(all_rows from 29 for 4) as col2,
substring(all_rows from 33 for 12) as col3
where the "from" and "to" needs to be changed according the data dictionary that you have.
Hope that helps!