Recently we are facing a recurring Informatica job failure with "External Loader Error" with UPI defined on the identity column with max value 2147483646 (Mload defined on the staging table with UPI on an identity column called "id"). We could see that the max value for the column loaded into this Mload staging table for this load (400000000) is well less than the max value. From the UV table we could see that the dbcerror code is 2794, indicating duplicates on the PI.
Are the duplicates being generated from the identity column id? If yes, why does identity column generate duplicates (we have around 35 records in UV table, all this 35 PI value also exists in the target staging table there by causing duplicates on PI column)
If not, what might be the issue leading to the failure.
Thanks for your time on this.
Please find the Identity column definition below.
ID INTEGER GENERATED BY DEFAULT AS IDENTITY
(START WITH 1
INCREMENT BY 1
You have defined the identity column with cycle so that will eventually allow a duplicate value.
Also the identity column in the table does not necessarily increase by one each time a record is loaded.
Each amp gets a set of those identity column numbers to use so depending on which amp the data is loaded, they will get the next available identity number on that amp.
Dups can arise as GEN BY DEFAULT will not generate an identity value if insert row has a value for the IC. To guarantee uniqueness, specify GENERATED ALWAYS and NO CYCLE.
Thank you Shelley and Ivy.
I could see that some of the rows from source are moved to UV tables (the table on which mload defined has a UPI on above identity column) So, if I define the identity column with "no cycle" property, can I be ensured that the identity column do not generate duplicates (we are not specifying values for the identity column explicity in the insert operation in Mload control file).
Will the identity column have respective values in the work table? If so, will be populated during acquisition phase or generated only when inserting the row in to the target table during application phase.
If we check the option "drop error tables" in Informatica manager connection for a mload, will the error tables be dropped even if it has some rows which would not have been dropped by Teradata during normal scenario.
To guarantee uniqueness, please specify GENERATED ALWAYS and NO CYCLE.
Yes, the identity column has respective values in the work table.
Before you make the decision to check the option "drop error tables" in Informatica, please carefully consider the following:
Teradata MultiLoad creates four tables that are required for restarting a paused Teradata
• Restart Log Table
• Work Table
• Acquisition Error Table
• Application Error Table
If any of these tables are dropped while a Teradata MultiLoad job is paused, restarting the job
normally is not an option, and the target tables can be left corrupted.
In some limited circumstances, restart such a job using special procedures is possible but, in
most cases, do not drop these tables for a paused job without careful analysis of the situation.
The following topics describe the operational implications of having dropped the required
restart tables for a paused Teradata MultiLoad job. Refer to Chapter 2 (Implications of Dropping
Required Teradata MultiLoad-Created Tables) in Teradata MultiLoad Reference for further information:
Thank you Ivy. We have checked the max value for the identity column in the landing table and could see that the max value for the id column in the table is way smaller than
max value 2147483646 defined for identity column definition. We have also checked to see if there are any values for the ID column with negative values(which should start with once max value is reached), but couldn't find them either.
I have gone through the manual and could find that identity column will start to cycle values (repeat) after it reaches the max value defined for the identity column. Where as we are receiving the duplicates on the identity column which has not reached the max value. Have you ever faced this scenario. Why is this happening? I am a bit perplexed.
No, I did not see this before.
BTW, have you specified both GENERATED ALWAYS and NO CYCLE? How many rows were you loading? Thanks!
Sorry for the late response. The data is copied from PROD to QA during the builds which is inserting records for identity column values I guess from PROD (INSERT statement explicitly specifying values for identity column which is defined as "generated by default". Later during the batch runs, the job is defined to load values to all other columns except the ID column, which is generated by Teradata. Below is from TD Manual.
"You must comply with all of the following restrictions if you want to guarantee the uniqueness of GENERATED BY DEFAULT column values:
• You must specify NO CYCLE.
• Any user-specified values you specify must be outside the range of any system-generated values.
• You must enforce the uniqueness of the user-specified values yourself"
Anyhow, the current ID column which is defined as identity column doesn't serve our purpose; valid records are being diverted to UV table due to the values generated for ID column by Teradata QA which are already inserted for ID column during copyback from PROD (explicity specified values for ID col I guess). We are resorting to the option of changing the PI on the table.