How to Improve Step by Step ETL/ELT Open Sources Tools (Talend) On Teradata

Tools
Tools covers the tools and utilities you use to work with Teradata and its supporting ecosystem. You'll find information on everything from the Teradata Eclipse plug-in to load/extract tools.
Teradata Employee

How to Improve Step by Step ETL/ELT Open Sources Tools (Talend) On Teradata

The objective of this article is to explain step by step how to improve Talend in Teradata environment using Teradata utilities (FastLoad, MultiLoad).

Part 1 : Application setup

Prerequisites: Setup Talend TOS environnement for Teradata

  • Download TOS All available for all environment (Window, Linux, MacOs)
  • Install TOS

Step 1 : Download Talend

The download of Talend Open Stiudio Zip file is available on: http://www.talend.com/download.php#

Step 2 : Unzip on your local directory

Step 3 : launch TOS Application

To launch TOS Application on the Windows environment, execute “TalendOpenStudio-win32-wpf.exe” and accept the License agreement:

Step 4 : Create a new connection

After license agreement aceptance, click on  Then complete the fields on the next screen : "Référentiel" : Repository=Local, "Email" = , "Mot de passe":Password=, "Créer un nouveau projet":Create a new project.

Step 5 : Create a new Project

Select on the list box “Create a New Project” and click on "OK" button, the screen allowing the creation of a new project appears:

"Nom du Projet": Project Name = Step By Step,

"Nom technique": Technical Name = STEP_BY_STEP,

"Description du projet": Project Description = ,

"Langage de génération": Generation Language select  Java



Step 6 : Select a Project

Select now new Project that you have created, "Projet Sélectioner": Select Project = Step By Step.

Step 7 : Start TOS Application

Click on “Start now” to open TOA application

TOS application is now ready to use.

Part 2 : Step by Step Development

Scenario 1: Classical load ETL with the tTeradataFastLoad using JDBC

Generate a dataset sample and load the result into a Teradata table. (the CREATE TABLE script for cust_teradata is at the end of the document)

Components need:

  • a tRowGenerator (category Misc in the component Palette)
  • a tTeradataFastLoad (category Databases/Teradata in the component Palette)

Step 1 : database metadata discovery wizard

  • click right on the “DB connections” item in the Repository and choose a name of the metadata connection in your repository, for example “myTeradata”.

  • then in the following step, specify the right value to connect to your Teradata database. Click finish to save it in your repository.

Step 2 : retrieve metadata schema from the Database

  • click right on “myTeradata” item in the Repository and choose “Retrieve Schema”, then choose your expected filter Table, View, Synonym or write a specific filter of Object return into the wizard.
  • check to select Table structure that you would like retrieve in the metadata repository.

  • then in the last step you can verify the structure returned via the catalog.

Step 3: Create a new JobDesign

  • click right on “Job Designs” in the Repository and specify a name
  • then choose a tRowGenerator component into the component palette (Misc category)

  • drag-n-drop the cust_teradata item in your repository into the JobDesign interface and choose the tTeradataFastLoad component into the dialog box.

  • then create a link ROW between tRowGenerator and tTeradataFastLoad components. Click-right on tRowGenerator, choose “Row” and “Main” and link it to tTeradataFastLoad.

Step 4: tRowGenerator configuration

  • Double-click on tRowGenerator to open configuration GUI.

Step 5: tTeradataFastLoad configuration

  • Double-click on tTeradataFastLoad to open basic settings properties tab.

Step 6: tab RUN to execute the JobDesign

Scenario 2: load ETL with the tTeradataMultiLoad using mload teradata utility

The dataset already exist into a delimited file (comma or semi-colon separated). You can produce this dataset with the previous JobDesign. Just modify the output component and replace by tFileOutputDelimited.

Step 1 : tTeradataMultiload

  • Pick and drop the tTeradataMultiLoad into a new JobDesign

Below, you can see :

  • the mload script code generated by Talend
/**
* [tTeradataMultiLoad_1]
*/
currentComponent="tTeradataMultiLoad_1";

String tableFullName_tTeradataMultiLoad_1 = "talend" + "." + "cust_teradata";
java.io.FileWriter fw_tTeradataMultiLoad_1 = new java.io.FileWriter("D:/scriptfolder/"+"cust_teradata"+".script");

StringBuilder script_tTeradataMultiLoad_1 = new StringBuilder();
fw_tTeradataMultiLoad_1.write(".LOGTABLE "+"talend"+"."+"cust_teradata"+"_lt;\r\n");
fw_tTeradataMultiLoad_1.write(".LOGON "+"talend"+","+"talend"+";\r\n");

fw_tTeradataMultiLoad_1.write(".BEGIN IMPORT MLOAD TABLES "+tableFullName_tTeradataMultiLoad_1+" SESSIONS 8;\r\n");
fw_tTeradataMultiLoad_1.write(".LAYOUT customer_layout;\r\n");
fw_tTeradataMultiLoad_1.write(".FIELD "+"ID"+" * VARCHAR(255);\r\n");
fw_tTeradataMultiLoad_1.write(".FIELD "+"NAME"+" * VARCHAR(255);\r\n");
fw_tTeradataMultiLoad_1.write(".FIELD "+"DATE_BIRTH"+" * VARCHAR(255);\r\n");
fw_tTeradataMultiLoad_1.write(".DML LABEL DML_LABEL;\r\n");

fw_tTeradataMultiLoad_1.write("INSERT INTO "+tableFullName_tTeradataMultiLoad_1+"("
+"ID,NAME,DATE_BIRTH"+") VAlUES("+":ID,:NAME,:DATE_BIRTH"+");\r\n");
fw_tTeradataMultiLoad_1.write(".IMPORT INFILE "+"D:\scriptfolder\bulk_file.txt");
fw_tTeradataMultiLoad_1.write(" FORMAT VARText '"+";"+"' LAYOUT customer_layout APPLY DML_LABEL;\r\n");
fw_tTeradataMultiLoad_1.write(".END MLOAD;\r\n");
fw_tTeradataMultiLoad_1.write(".LOGOFF;\r\n");

fw_tTeradataMultiLoad_1.close();

String sb_tTeradataMultiLoad_1= new String("cmd /c mload < "+"D:/scriptfolder/"+"cust_teradata"+".script > "+"D:/scriptfolder/bulk_file_log.txt"+" 2>&1");
/**
* [tTeradataMultiLoad_1]
*/
  • Create table script for cust_teradata
CREATE MULTISET TABLE talend.cust_teradata ,NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT
(
ID INTEGER,
NAME VARCHAR(255) CHARACTER SET LATIN NOT CASESPECIFIC,
DATE_BIRTH DATE FORMAT 'YYYY-MM-DD')
PRIMARY INDEX ( ID );

1 REPLY
Enthusiast

Re: How to Improve Step by Step ETL/ELT Open Sources Tools (Talend) On Teradata

This is a great doc put together. I have a question for you.

What's the diff between trowgenerator and writing to a flat file and loading via Fastload in TD