Which column should i select for Index creation

Database
Enthusiast

Which column should i select for Index creation

Hi All,

I have data in an csv file and I am planning to create an table in Teradata and insert these records.

I then need to create an PI after the analyzing the data properties.

I somehow need to know the queries for the following.

1) Get the data distribution among all the AMP's

2) How many AMP's are there in the current installation?

3) Other queries which will help me identifying the right Primary Index

Thanks for your response in advance.

5 REPLIES
Enthusiast

Re: Which column should i select for Index creation

Hi,

The PI of the column is choosen in Teradata for the Data Distribution.

We can get the No of AMP's using this hashamp function.

SEL HASHAMP()+1 -- This will give the No of Amps present in the TD Server.

Thanks & Regards,

Adharssh.

Enthusiast

Re: Which column should i select for Index creation

The below query will give you some stats about the data distribution as how many rows will be landing on different amps... Execute the query with different column list, you surely will be able to choose the ones that are more appropriate...

SELECT HASHAMP(HASHBUCKET(HASHROW(<PI_COLUMN_LIST>))) AS "AMP#",COUNT(*)
FROM <TABLE_NAME>
GROUP BY 1
ORDER BY 2 DESC;
Enthusiast

Re: Which column should i select for Index creation

I have 2 amps and have a table of 4 columns col1,col2,col3 and col4.

Following is the ouput of hashamp query. As can  be seen except col1, other columns distribute row across amps appropriately. But what should be derived for col1?






Column Name Amp # Row count
Col1 0 58
Col1 1 43
Col2 0 54
Col2 1 47
Col3 0 47
Col3 1 54
Col4 0 50
Col4 1 51
Enthusiast

Re: Which column should i select for Index creation

Seems the counts are not readable. Following should be helpful.

Amp # 0

col1: 58

col2: 54

col3: 47

col4: 50

Amp # 1

col1: 43

col2: 47

col3: 54

col4: 51

Enthusiast

Re: Which column should i select for Index creation

I would go with Col4, as the data on both the AMPs is distributed equally, 50 & 51.

In your case you can choose either one as the data count is small thus the PI selection won't make too much difference. The PI selection becomes more critical when the data count is huge lets say hundreds of million rows landing on the AMPs.