Hi Everyone, I'm going to work on an ETL project -where source is oracle ,target is teradata and ETL tool to be used is informatica.
There are two levels -one is load into staging(staging is also teradata) and second is loading into target tables. I query the oracle source tables and load into staging area. Which of the approach is good - 1.create a one to one mapping to do this or 2.Use any of the tools offered by Teradata -like Mload,Tpump,etc in informatica and do it.
Please tell me the Pros and Cons of these two approaches. I've been told to use the first method(one to one mappings).
Please advice on the second level as well ( from staging to target) whether to use one to one mapping or teradata tools.
I'm really afraid because there is an automatic preimary index getting created in Teradata tables and this lead to rejection of records in some cases.
Please clarify.Thank you all in advance.-Thanks Jerome
For phase 1 you can either use informatica mapping using teradata loader connections (like fastload,tpump,multiload) or directly use teradata loader scripts to load data into staging table. Its preferable to use informatica with loader connections which will be faster for develpment, and will only be one to one mapping.
For phase 2 stage to target, you must go for informatica and use the joins in case if there is any. You'll have to write sql override queries which would replace the SCD transformations in informatica. The query must distinguish the new records and update records using a flag and do the operation of insert and update according to the flag value.
There is an option in informatica (pushdown optimization), which must be used when you go for complex override queries.
Let me know if you have more doubt.
In case of primary index, it is the lifeline of a teradata table. You need to provide the name of the primary index while creating a table, else it automatically takes the first column as the primary index. Using the PI only the data gets loaded into TD as well as retrieval is done using the PI only.
To make things more clearer-you has suggested me use loader connection instead of Relational Connection.
As indicated by you,does the loader connection will be faster than the relational connection?
I hope this would help me in the one-time load/history load as the historical data is provided in the form of oracle dumps.
But in the incremental loads- the source data is being pulled from the oracle database's views(obviously there will be performance issues- but clients requirement is this way- can't help them)- do you suggest again here as well the loader connection will be faster than the relational connection?
Phase 2: I'm not very clear in the explanation given by you here.
As highlighted by you,I'll be joining staging tables to load into targets.
I'll be using a lookup transformation to check whether the record is present or not.
Accordingly ,I'll be inserting or updating the target tables(by Tagging them accordingly) using Update Statergy transformation.
Ofcourse I'll be maintaing an history by giving an "end-date" to the already existing record - when update is done in the database and insert a new one with a sysdate as "start-date"
Basically I'm using the lookup and update statergy transfoemation to achieve my SCD concept here.
Is this the concept you have explained here?
Please explain me what is this "pushdown optimization" in informatica?
The concept is the same, but the implementation process is different. You can do it in the way you have described using US n lookup's , also the way I have mentioned in the previous post. The SCD can be implemented with over ride querie for faster development instead of going for transformers like lookup n US. When using queries for SCD joins will be replacing lookups and US will be replaced by a flag column which says whether the record is gonna be insert/update provided the granularity of record change is 1 day.
About extracting from Oracle views to teradata you'll have to go for relational connections only. Loaders can only load data from flatfiles to Teradata.
It might be confussing for you to understand the over ride query which am talking about until unless you see the complete process of it.
You've got to be careful here and make sure you follow the standards of the architecture you are working in. If Informatica is the default tool for data integration then you should be using it to deliver the data - it has native connectivity to Teradata and access to TPump and Multiload. It should also handle the conversion of data types from Oracle to Teradata. It can also be used to validate your referential integrity so your rows are captured by exception handling rather than just being rejected. It also had pushdown optimization (though I don't know if your version supports it) where you can use the Informatica GUI tool to build Teradata database steps. This opens it up to data lineage and impact analysis reporting.
The Informatica GUI designer may give you load jobs that are easier to support and maintain.
Informatica Push down optimization is a new concept embedded into Informatica 8 Version series.
Pushdown optimization is a concept where you can try to make most of the calculations on the database side than doing it at the informatica level. For example if you need some kind of aggregation, you can push those computations to be done at the database end. It is basically to utilize the power of database, so you do it at source side as well as target side.
Currently we're doing Oracle to Teradata project by Informatica 8.6.1 tools, some problems we found, 1. character 6706 error 2. performance issue: Teradata builds Lookup cache much slower than Oracle does 3. unexpected Mload locked issue, 4. table blocking issue: the job will hang for a long time.