Best criteria for selecting NUPI

Database
Enthusiast

Best criteria for selecting NUPI

Hi,

We have a SET table with NUPI defined on it.I understand this is bad, as it needs to check for the duplicates by scannning the entire row set while data is loaded into the table.

The column which is defined as an NUPI is an id column which is sequencly generated, so no way this is going to be duplicated.

We are going to make this as UPI, so that it is righlty distributed across all AMPS.

Apart from the id column we do not have any other column which is to be unique...

Weare performing some performance tweaks, Can we define multi column NUPI on the table? Is this advisable? If yes how to decide which column would be apt to be selected for NUPI?

Also is it possible to define both UPI and NUPI for the table??

Appreciate your comments.

Regards

John

6 REPLIES
Junior Contributor

Re: Best criteria for selecting NUPI

Hi John,

this is not bad unless there are too many rows per PI value. Duplicate row checks will be done only for roews with the same PI (in fact, the same Row Hash).

If this sequence column is not used for joins/where it's totally useless, you'll get a perfectly distributed table with a horrible performance.

Enthusiast

Re: Best criteria for selecting NUPI

Hi Dieter,

Thanks a lot for your response. So you say, the NUPI defined on the id column is ok, as long as there are no too many rows for that NUPI defined correct?

Example below is ok right though its a SET table duplicates values wont be checked unles the ID column was same?

id     product_id   product_name    brand_id     brand_name     Date

001     p123           Tavera         b123         Cip         2015-03-02

002     p123           Tavera         b123          Cip         2015-03-03 

In case as below, then duplicate checks would have been done correct?

id      product_id   product_name    brand_id     brand_name     Date

001     p123           Tavera               b123           Cip             2015-03-02

001     p123           Tavera               b123          Cip              2015-03-03

Dieter- We have a table below in which the only field that unique is the SEQID column which is the surrogate key. Can you advice from the table below what index i need to be defining?

CREATE TABLE CONSUMER_PRODUCT_BKUP

(

SEQID DECIMAL(27,0),

PRODUCT_ID VARCHAR(20),

PRODUCT_NAME VARCHAR(30),

DATA_SOURCE_ID  VARCHAR(20),

BRAND_NAME VARCHAR(30),

TIME_PERIOD DATE,

PROCESS_DATA_MONTH DATE,

ACTIVE_FLAG CHAR(1),

END_DATE DATE

)




DATA_SOURCE_ID, TIME_PERIOD and PROCESS_DATA_MONTH are the columns used in join conditions.

SEQID column is not used in any join/where conditions.

Can we define a Multicolumn UPI as

UPI(SEQID,DATA_SOURCE_ID,TIME_PERIOD,PROCESS_DATA_MONTH )?

OR

Can you suggest what would be ideal??

Many thanks for all your comments, time and simple solutions...

Regards

John

Junior Contributor

Re: Best criteria for selecting NUPI

Hi John,

if SEQID is not used it's useless as PI column, in fact you don't need it at all.

What's the actual Primary Key of this table?

Enthusiast

Re: Best criteria for selecting NUPI

Hi Dieter,

 I see except the SEQID field there are none other fields which have unique values and hence no primary key.

Except this SEQID being defined as Primary Index, there is no other Index defined on this table.

Regards

John

Enthusiast

Re: Best criteria for selecting NUPI

Hi Diether,

Is it not suggested to create a PI on the sequence id and a NUSI on - 

SEQID,DATA_SOURCE_ID,TIME_PERIOD,PROCESS_DATA_MONTH

.This will have a proper disstibution as well as Sec index for access. Of course the queries need to be tested if they use the SI or not and then decide ? Generally what is recomended practice in such situations.

Thanks !

Samir

Junior Contributor

Re: Best criteria for selecting NUPI

Hi Samir,

first check the FK columns (used for joins), then the columns used in Where, but defining a useless PI is the last resort. 

Every relational table should have a logical Primary Key constraint, simply adding a stupid sequence still doesn't mean there's a real PK and you'll encounter duplicate rows.

And based on John's other thread Non duplicate row selection query he tries to get rid of them :-)