Difference and concepts of ETL.

UDA
Enthusiast

Difference and concepts of ETL.

Hi

Recently i was joined in IT Company. Now i am a Trainee in DWH, still no project are alloted to me.

Last day my PM told that get trained in Teradata of ETL part, and didn't explain me any more.

CAn anyone please tell me

1) Difference in ETL, Administration and DQ of Teradata.

2) What are the tools i have to learn in ETL of Teradata.

Presently I am working in INFA tool.

Any replies are highly appreciated.

Thanks in Advance
Paddu
24 REPLIES
Enthusiast

Re: Difference and concepts of ETL.

Your first port of call is to find out if there is a data warehouse already existing that you are scheduled to start supporting or if your team are building a new DW from scratch.

If there is already a platform then you need to find some of the following:
* What technologies your Teradata server sits on i.e. Teradata for Windows/Teradata for Unix/Linux.

* What Teradata/3rd Party utilities you use to load your data from the source system into Teradata(staging area) i.e. BTEQ/FAST LOAD/MULTI LOAD/IMPORT/TPUMP etc. You can locate these via START>>PROGRAMS>>TERADATA CLIENT...betq/fload etc

* If you use only stored procedures to load the data from staging into the main DW tables or if you use a mixture of Stored Procedures & UDF's written in C/Java etc.

Any good developer/DBA could give you the answers to these.

1) Difference in ETL, Administration and DQ of Teradata.
ETL stands for Extract/Transformation and Load. It is the process by which you get your data from the source system into your data warehouse. Administration is the DBA function and consistes of storage management/backup and restore/recover methods/Source Control(maybe)/Security etc.

2) What are the tools i have to learn in ETL of Teradata.
As Above, this depends on the technologies and utilities you use in order to get data from source to destination. The Utilities are generally BTEQ/FLOAD/MLOAD/TPUMP/OLE Load/Fast Export etc and then Teradata stored procedures but there are 3rd party tools available but not supported(i think). You will need exposure to all of these utilities and tools to get training.

3.) The teradata training web site provides a wealth of information on what training can be acquired.
http://www.teradata.com/services-support
or talk to your vendor who could arrange and advise on formal training courses.

Hope this helps, Best of Luck.
Enthusiast

Re: Difference and concepts of ETL.

Hi Supreme

Thanks for your guidelines....

I have small doubt on it, you told that utilites are there(FLOAD\MLOAD\etc) to load data from Source systems to Target(Teradata database).

Then why informatica?

Where it comes into picture...?

Thanks in advance..
Paddu
Enthusiast

Re: Difference and concepts of ETL.

Hey Paddu,

It seems you get a 'Supreme Being' on this forum after a number of questions & answers are posted on this forum. It certainly is not reflective of the experience I have with Teradata. I have only been using it for 6 months and am still very much a novice. However, I do know data warehouses and was heavily involved in Migrating our old SQLServer DWH to Teradata.

"BTEQ/FLOAD etc". These are the load utilities we used as they are native to Teradata. They have advantages and disadvantages like anythinmg else. However, They are not exclusive.

I have not used Informatica but as far as I know it is a 3rd party tool. Maybe Teradata sell and support it, I don't know. Maybe one of the real gurus on this forum can answer that or ask your developers and/or vendors.

All being said I have read a little about Informatica. It seems it is a Business Intelleigence and Data Load tool. If you are using it there must be an expert on it within your team. At the end of the day it's another way of getting data from one place to another.

Hope that helps.

Marc.
Enthusiast

Re: Difference and concepts of ETL.

As far as my knowledge goes, informatica is a tool which has certain analytical functions performed outside the database. you can create transformations in informatica, which helps to create complex queries using a graphical design tool. you can create certain type of transformation using informatica power designer. basically, it helps you to do away with lot of coding. you can achieve the same thing using the PL/SQL too. but when you are using the operational database, it is not appropriate to perform any data warehouse operations. it may affect the performance of the system. so we can take the data warehouse related functions out of the operational database. Please let me know your thoughts.
Enthusiast

Re: Difference and concepts of ETL.

Hey Tena

Your thought on Informatica is Right.
To overcome the PL/SQL codes, informatica helps a lot.

I have some questions on your reply:

1) Is PL/SQL present in Teradata?

2) "Operation Databases" means are you saying about the "OLTP".

3) If Teradata utilites are there for loading, then why we need to go for Informatica?( may be to reduce the coding).

4) Is all the teradata utilites are GUI or CUI?

Hi Mic

Please add your thoughts also in it.

Thanks
Paddu
Enthusiast

Re: Difference and concepts of ETL.

Teradata has it's own version of stored procedure language, there are variations in syntax with oracle, but it's not that hard to grasp, and it's not as sophisticated as Oracle PL/SQL.

Starting TD 12.0, we would also be able to write Java stored procedures.

Tools like informatica give a generic interface to ETL operations against various DB systems, which means you may need not learn vendor specific ETL scripting syntax etc... most of these ETL tools also has plugins for vendor specific utilities (like fastload, mload etc in Teradata) than can be used via the ETL tool, this ensures that the ETL tool gets maximum performance leverage while still giving an (almost) vendor independent development interface to the developers.

Teradata utilities are terminal based (much like windows command prompt), you can also write all the instructions into a text file and execute it using the utility. which is the general approach.
Enthusiast

Re: Difference and concepts of ETL.

Lots of interesting stuff there.
Cheers.
Enthusiast

Re: Difference and concepts of ETL.

Its good and great explanation from all.

Thanks a lot.... :-)

In the mean while, i met so many experienced guys to have more on TD. Following are the points that i came to conclude on ETL of Teradata. Please add any more points or please correct if any wrong goes.

1) MULTI Load is the Utility to perform the Extraction Process, this may extract data from any system and creates the flatfiles may be temporary.

2) Using BTEQ, we perform the Transformation. This may be like performing the logics of operations to the Target or it may be any from of Logical operations.

3) FLOAD is the Utility to perform the Loading part of ETL. Once the flatfiles created in the MLOAD goes tranformation in BTEQ then the result will be loaded in to the Targets through FLOAD Utility.

4) Informatica a Complete tool for ETL, is used to perform all the above actions. This tool must enable the Plug ins of all the Utilites, so that to perform the actions.

5) All the TD Utilites are Command Based tools. Have their own set of commands in each Utility.

6) Base Knowledge to have TD: BTEQ, MLOAD, FLOAD.

I belive this discussion will make the Beginners well knowledge on TD.

Enthusiast

Re: Difference and concepts of ETL.

hi Paddu
I am little confused on the way you differentiated MLOAD and FLOAD

As per my knowledge, both MLOAD and FLOAD are used to load data from FILE to TABLE with the following difference:

1) FLOAD is used to load data from the file to an EMPTY table only.
2) FLOAD cannot be used for updates or upserts into the table. For updates MLOAD should be used
3) FLOAD as the name suggests has very high performance.
4) Also FLOAD does not load duplicate records. If there are 2 records with exactly the same values of each record, only one will be loaded.
However, multiload loads both the records, if your table definition permits.

The above is as per my knowledge.
Please correct me if i am wrong. Thank you.