Difference and concepts of ETL.

UDA
Enthusiast

Re: Difference and concepts of ETL.

The discussion is quite interesting . I would like to add few more questions here.

It is understood that MLOAD and FASTLOAD are native tools of teradata for ETL process.Informatica is a third party tool that has plug-ins for teradata and which can do some Transformations to data before loading the data into Teradata .Now I would like to understand how expensive or inexpensive it is in terms of cost,maintenance and usage when a third party tool is used for ETL in a datawarehouse environments.Why do companies opt for a third party ETL tools when Teradata has its own set of ETL stuff.

Enthusiast

Re: Difference and concepts of ETL.

Hi Subha

Yes, what you told is exactly right. After looking into your comment, I made again my questions to my senior people. Now i was good enough.

You have explanied about the FLOAD, let me add some points to that...

Oracle retreive\load the data in the form of ROWS nothing but a line of record. But in Teradata, some utilites will do the action in the form of Packets. Rows will be converted into packets and vice versa though CLI(Call Level Interphase).

FLOAD loads the data in the form of Records only. Where as MLOAD loads the data in the form of PACKETS.( I too need some more info about other utilites)

One more point is, MLOAD will make the TABLE LEVEL LOCK, that means when MLOAD is happening, we cann't fire any query on that particular table to fetch the records.

Hey Marc, Tena, Joed

Need your inputs too.....

Coming to Leo Question about the

" why we need Third Party tools?"

Because, All the Teradata tools are Command Based tools and requiers the Scripting Knowledge.

Coming to Informatica,

1) Its a GUI.
2) In built features which makes 0 coding.
3) Scripts will be generated automatically.
4) We can even change the Generated Script as required.

Thats may be the main Features.

Enthusiast

Re: Difference and concepts of ETL.

Paddu,

one more question on third party ETL.

I have seen environments that still use teradata MLOAD and FASTLOAD to load data.However they are scheduling these loads using Job schedulers.

So now I would like to understand how expensive or inexpensive it is in terms of cost(expenditure),license for third party tool and implementation overhead when informatica is used compared to Teradata ETL.

Enthusiast

Re: Difference and concepts of ETL.

Hey

I dont know the cost of the tool or Licence, even i am working in Informatica i dont have these details.

Coming to the importance, you are right, even so many are using Utilites of TD only. But this third tool will make us such that there will be no difference in working with TD are any other DB. May be Connections differs.

This third party tool will make us 0 level coding thats the over all benifit and make us clear picture of dataflow as it is GUI.

Enthusiast

Re: Difference and concepts of ETL.

From a non-technical standpoint, also remember that there are lots of folks who know tools like Informatica, DataStage etc out there compared to folks who are comfortable with Teradata ETL tools, the rest is a bit of maths and diplomacy when you are a customer who wants his stuff up and running quickly.
Enthusiast

Re: Difference and concepts of ETL.

You've got to remember that the ELT in Teradata is not free. If you load raw data into Teradata and use the scripting language and tools to transform, consolidate, remove duplicates, perform data quality and deliver it to BI tables you pay a price in terms of disk space and CPU utilization. This can be minimized if you run it overnight (and don't have big backup windows) but you still need to size your Teradata to handle a lot of extra hot storage space for the multiple times you transform and store and stage your data.

I am currently the architect on an EDW using a 100 server DataStage grid and a large Teradata database and we have the ability to transform data 24x7 - trickle feed the EDW during the day and fast bulk load overnight with ready to load data. We can move into more heavy duty transformations such as address standardization, we can perform change data capture and change data detection on the grid with minimal impact on Teradata and we can use the grid for sending data to other systems.

Informatica has added pushdown optimization for executing some steps in a database like Teradata and IBM are adding similar functionality to DataStage 8.1 at the start of next year. This gives you the best of both worlds and lets you build Teradata SQL steps (including temporary table support) in the GUI interface of the ETL tool. You can push ELT functions into both the source and the target databases.

In the ETL versus ELT debate the are pros and cons in both directions.
Enthusiast

Re: Difference and concepts of ETL.

Please go to the following links and read all the relevant topics you want to learn about Teradata Data warehouse:

http://www.info.teradata.com/DataWarehouse/eTeradata-BrowseBy.cfm?page=Teradata%20Database
http://www.info.teradata.com/DataWarehouse/eTeradata-BrowseBy.cfm?page=Teradata%20Tools%20and%20Utilities
Enthusiast

Re: Difference and concepts of ETL.

Hi friends...
especially thanks to paddu(who raise first doubt).. I got more information regarding Teradata ETL tools....

But I have basic doubt about ETL teradata tools... i.e., can we use Teradata ETL tools for loading data into warehouse which is not
made by Teradata??
what abt performance...third party tools can give more efficient performance than Teradata ETL??(warehouse is not build with teradata dbms)...

waiting for u r reply......

Thanks and Regards
Anil Reddy
Enthusiast

Re: Difference and concepts of ETL.

Hi Anil

Third party ETL tools like Informatica or any other, will not change the performance behaviour. It only makes the Developer to interact with the databases with graphically and makes it as User friendly.

Yes... we can load data from one database to another database using this third part tools.

Let me know in case you need more info from me.

Thanks
Paddu.
Teradata Employee

Re: Difference and concepts of ETL.

yes, your understanding is correct on fastload/multiload.

just to add multiload is aslso used for deleting the records and also for other subtle advantages of readng the table while loading and
also overcoming the limits of no of files / tables during the load.

Probaly Paddu meant something equivalent of BTEQ Export / FastExport, as the name suggests which are used for moving data out of TD into flat files.

TPUMP is another tool which one can use to get Extarct and load continously( can't speak more on this as have never practically used).