how to eliminate duplicate records in a file while loading into the target table without using fastload

General
Enthusiast

how to eliminate duplicate records in a file while loading into the target table without using fastload

Hi  Everyone,

could anyone help me to my below query.

I have a duplicate records in a file .

I want to eliminate duplicate records in a file while loading into table ( my table is populated table), we cant use fastload

if the table is populated table, so how can we eliminate duplicate records in a file and which utility we can prefer to load only unique records.

Thanks in advance.....

9 REPLIES
Enthusiast

Re: how to eliminate duplicate records in a file while loading into the target table without using fastload

Do you want to use unix/linux command? something like this

sort ... filename|uniq.....

or use awk.

You can see the arguement here :)

http://forums.teradata.com/forum/analytics/why-uv-in-mload

and you can see more in the forum.

Enthusiast

Re: how to eliminate duplicate records in a file while loading into the target table without using fastload

Hi raja,

I want to load unique records into table using multiload..

Regards,
Purushotham.m
Enthusiast

Re: how to eliminate duplicate records in a file while loading into the target table without using fastload

Hi

Can I use ignore duplicate insert rows in multiload.
To eliminate duplicate records in a file.
Senior Apprentice

Re: how to eliminate duplicate records in a file while loading into the target table without using fastload

Is your target table SET or MULTISET?

Is there a UPI or NUPI?

What are you going to do with those records, INSERT them into the target?

INSERTs into SET tables will never load duplicate rows, but might write them into the UV error table (controled by MARK/IGNORE DUPLICATE INSERT ROWS in DML LABEL).

Otherwise you might FastLoad to a staging table and then Insert/Select.

Enthusiast

Re: how to eliminate duplicate records in a file while loading into the target table without using fastload

Hi dnoeth,

My target table is multiset table and it is defined as NUSI.
My target table is populated table.so we cant use fastload.
So could u tell me how to eliminate duplicate records while loading into table,which utility u will prefer?

Regards,
Purushotham
Senior Apprentice

Re: how to eliminate duplicate records in a file while loading into the target table without using fastload

Hi Purushotham,

you can't do that with MultiLoad if the target table is MULTISET.

You got two options:

1. use Unix to remove duplicates before loading as Raja suggested

2. FastLoad to a staging table and then Insert/Select into your target table

Enthusiast

Re: how to eliminate duplicate records in a file while loading into the target table without using fastload

Thank u so much dnoeth..

Regards,
Enthusiast

Re: how to eliminate duplicate records in a file while loading into the target table without using fastload

Hi Purushotham,

In General, If your file has duplicate records and you want to load only distinct records then you have to create

-- Set table or Table with UPI. 

It will allow you to load only unique records in ur table and duplicates records will go to UV table.

In your case table is already populated so through multiload you can't skip duplicates records.

Thanks,

Abhijeet 

Enthusiast

Re: how to eliminate duplicate records in a file while loading into the target table without using fastload

Hi Purushotham,

To avoid duplicates, you have to use upsert with label "DO INSERT FOR MISSING UPDATE ROWS;"  into your multiload script.

I have tried and it working for me. Please refer below for an example.

Table: 

CREATE MULTISET TABLE retail.customers3 ,NO FALLBACK ,

     NO BEFORE JOURNAL,

     NO AFTER JOURNAL,

     CHECKSUM = DEFAULT

     (

      ID INTEGER,

      CUST_ID VARCHAR(6) CHARACTER SET LATIN NOT CASESPECIFIC,

      CUST_NAME VARCHAR(30) CHARACTER SET LATIN NOT CASESPECIFIC,

      CUST_GROUP VARCHAR(30) CHARACTER SET LATIN NOT CASESPECIFIC,

      CUST_SEGMENT VARCHAR(10) CHARACTER SET LATIN NOT CASESPECIFIC,

      CUST_COUNTRY_ID VARCHAR(3) CHARACTER SET LATIN NOT CASESPECIFIC)

PRIMARY INDEX ( ID );

Below is my data file.

1,R001UK,Adam Smith, Individual,Retail,UK

2,R002UK,Adam Smith, Individual,Retail,UK

3,R003UK,Adam Smith, Individual,Retail,UK

4,R004UK,Adam Smith, Individual,Retail,UK

1,R001UK,Adam Smith, Individual,Retail,UK

3,R003UK,Adam Smith, Individual,Retail,UK

4,R004UK,Adam Smith, Individual,Retail,UK

2,R002UK,Adam Smith, Individual,Retail,UK

2,R002UK,Adam Smith, Individual,Retail,UK

5,R002UK,Adam Smith, Individual,Retail,UK

25,R002UK,Adam Smith, Individual,Retail,UK

5,R002UK,Adam Smith, Individual,Retail,UK

25,R002UK,Adam Smith, Individual,Retail,UK

Mload Script:


.LOGTABLE retail.dwlogtable;

.LOGON 127.0.0.1/tduser,tduser;

.begin import mload tables retail.customers3;

.layout custlayout;

.field ID 1 varchar(10);

.field CUST_ID * varchar(6);

.field CUST_NAME * varchar(30);

.field CUST_GROUP * varchar(30);

.field CUST_SEGMENT * varchar(10);

.field CUST_COUNTRY_ID * varchar(3);

.dml label custdml

DO INSERT FOR MISSING UPDATE ROWS;

UPDATE retail.customers3

SET

CUST_NAME = :CUST_NAME

WHERE

ID = :ID;

insert into retail.customers3.*;

.import infile Mload_data.txt

format VARtext ','

layout custlayout

apply custdml;

.end mload;

.logoff;


This script will load only distinct records into your empaty or populated table.

Thanks,

Abhijeet