Selection of Primary index


Selection of Primary index

Hi Experts,

I have situation where i have to select the primary index . I have a huge table almost with 50 columns.

Combination of my integer columns are not so unique but if i add Varchar field it becomes unique. Even only varchar field alone is very unique. But as far as i know , as a good practice we should not select varchar field as PI  but dont know the reason.

Could you please let me know can i select the varchar field alone as PI if not then why? Or does not matter column is varchar or integer all does matter is distribution only.




Re: Selection of Primary index

Guidelines for selecting PI are if you use most frequently in access or equality conditions.

Dstinct values that distribute evenly across all amps to maximize parralelism. It is a column

that is non-volatile. It is good if it(they) has/have  much many more tha the distinct number of amps in the system.There are few more points but these are related to your reqt.

So go for the varchar if it suits the above guidelines. It may require that you can have composite too.Check the reqt.

I suggest you read the hashing algorithm, how the values are passed thru hashing algorithm(value being passed... and what is the output from hashing)...., then hashmap.....



Re: Selection of Primary index

Thanks Raja.

Actually i am planning to use Customer_Name column as PI but this column is not going to use much in the join columns but if i don’t use this column then i am getting skew space error  and its not allowing me to insert the records in the table.

Do you think i should use composite PI or just Name field?



Re: Selection of Primary index

I suggest you check  the data distribution properly. You have 50 columns. Else you may need to change later.

You can go ahead with composite if you are using them too in access conditions.

Teradata Employee

Re: Selection of Primary index

On a very high level, for data loading you need to select column(s) giving you the unique values for most of data .... for accessing the data you need to select column(s) causing minimum re-distribution of data.

Hence, there will never be a case that defines PIs 'ideally' .... and its always how the data is going to be used both ways.

Junior Supporter

Re: Selection of Primary index

Hi Nilesh,

While choosing a PI, you always have to balance between the distribution and access. Some columns will give you optimal distribution while some will provide optimal access performance. You need to choose column with good distribution and better access. Good distribution helps parallel processing, so having optimal distribution a lot of access performance will be gained.