I just wanted to ask this question regarding TPump. If I were to remove the IGNORE DUPLICATE ROWS line from my tpump script, would a failed load be able to continue where it failed? And is this also the same for Set and Multiset tables?
First, for MULTISET tables, MARK/IGNORE DUPLICATE ROWS has no effect (since duplicate rows are permissible in MULTISET tables).
For SET tables, IGNORE DUPLICATE ROWS tells TPump to continue without logging duplicate row errors to the error table, while MARK DUPLICATE ROWS tells TPump to log each duplicate row error and then continue, until any specified ERRLIMIT is reached (see the Teradata Parallel Data Pump Reference for details).
Removing IGNORE DUPLICATE ROWS will cause TPump to MARK DUPLICATE ROWS (the default), so it's unlikely to improve things.
However, when ERRLIMIT is reached, TPump takes a checkpoint before terminating. You should be able (possibly after increasing the ERRLIMIT value) to restart (continue) the job by resubmitting it.
I can't be more specific without knowing the reason your job failed in the first place.
While running TPT through stream operator the duplicates are been inserted in target table created with SET option. Could you please tell the reason for this as u commemted With regard to duplicate rows, the statement "If you attempt to insert duplicate records into a SET table, the duplicates are discarded without any notification that an attempt to insert duplicates took place." isn't strictly true. It happens to be true if FastLoad (or the TPT Load Operator) is used. However, when using MultiLoad (or the corresponding TPT Update Operator) or TPump (or the corresponding TPT Stream Operator), you can choose whether or not duplicate rows are recorded. If you're using SQL directly (e.g., BTEQ), duplicate rows will always be reported (2802 error).