why should we use SET table ?

UDA
Enthusiast

Re: why should we use SET table ?

Hi Fred,

Thank you very much for your response. Whether FLOAD performs the duplicate rows check for the SET tables also?And also is there any limitations to load the SET table?

Please advise

Enthusiast

Re: why should we use SET table ?

quick question:

Will this cause a nupi duplicate row check operation too?

Insert into <A_4_COLUMN_set_table_with only NUPI ON IT>

SELECT COL1, COL2,COL3,COL4 FROM

<MULTISET SET STAGE TABLE>

GROUP BY 1,2,3,4

Regards,

Suhail

Enthusiast

Re: why should we use SET table ?

Hi Suhail,

You question is a little bit ambigous, but if you have a NUPI defined on a SET table, the duplicate row check will be performed. 

You can avoid duplicate row check by either defining a table as Multiset, or making the column UPI. One other way to impllement UNQUENESS you can define a USI.

Khurram
Enthusiast

Re: why should we use SET table ?

I guess the ambiguity was that I missed to mention that the target table is empty(I apologize). 

I wanted to know if nupi dup row check will be done if we're inserting data into an empty set table from a multiset stage table inspite of a group by on all columns in the stage table(thereby removing duplicates)

-Suhail

Enthusiast

Re: why should we use SET table ?

Yes, as long as the table is set and index is non unique the duplicate row check will be performed. The table is empty initially but with the insertion of rows, the table will no longer be empty and set table will be comparing rows byte by byte to avoid duplicates.

Khurram
Enthusiast

Re: why should we use SET table ?

I apologize for being persistent on this.

So you do confirm that the very first insert-select(where the insert is happening into a empty table with 0 records) will also be subjected to NUPI DUP ROW CHECK?

I can understand why subsequent inserts into the same table will be subjected to nupi dup row check but why the first one?

-Suhail

Enthusiast

Re: why should we use SET table ?

Dieter:

I've always been under the impression that a good way to improve the insert efficiency (via SQL) of a set table is via the use of a PK/USI.  Prior to insert, the USI is used to check for duplicates through the hash values of the index columns.  This saves the SET Table second row instance check.  I was recently told that the row is first inserted and then the index table is inserted.  The index insert may or may not pass the duplicate test and will fail accordingly.  Consequently, a USI won't save time.  This does not sound reasonable to me.  Wouldn't it make more sense to check the hash value of the USI first for hash duplicates and save all this I/O?

Can you clear this up for me?

Thank you in advance

Re: why should we use SET table ?

@Fred

I used your 2/10/2009 example SQL exactly in a TD v14 env and both INSERT statements failed on the same 2801 error.

Might TD have changed the behavior such that SET and MULTISET now function the same in your scenario?

Teradata Employee

Re: why should we use SET table ?

No, the behavior of SET / MULTISET is unchanged. The INSERT/SELECT into the SET table succeeds and inserts two rows.

CREATE TABLE T1 (c1 INTEGER NOT NULL, c2 INTEGER) PRIMARY INDEX (c1);
INSERT T1 VALUES (1,1);
INSERT T1 VALUES (1,2);
INSERT T1 VALUES (2,1);
CREATE SET TABLE T2_S (c1 INTEGER NOT NULL) UNIQUE PRIMARY INDEX (c1);
CREATE MULTISET TABLE T2_MS (c1 INTEGER NOT NULL) UNIQUE PRIMARY INDEX (c1);
INSERT T2_S SELECT c1 FROM T1;/* succeeds - quietly eliminates the duplicate and inserts 2 rows */
INSERT T2_MS SELECT c1 FROM T1;/* fails with duplicate key error */
Enthusiast

Re: why should we use SET table ?

Hi!

 

I have a similar issue:

 

A table has columns of the form: timestamp | attribute1 | attribute2| ...

I would like a daily batch loading to add new rows to this table, but ONLY if one or more of the attributes changes (the timestamp always changes).

So I do this:

1. create a SET table with Unique Primary Key = all columns except the timestamp

2. use INSERT INTO from the source table with the same columns.

 

However. When the batch load hits a row that is the same as one already in the table (only the timestamp has changed) it throws an error instead of simply omiting the duplicate and continuing.

 

How can I get it to just add changed records every day and not throw errors? (these mess up the batch job process)

 

Thanks