I have data in an csv file and I am planning to create an table in Teradata and insert these records.
I then need to create an PI after the analyzing the data properties.
I somehow need to know the queries for the following.
1) Get the data distribution among all the AMP's
2) How many AMP's are there in the current installation?
3) Other queries which will help me identifying the right Primary Index
Thanks for your response in advance.
The PI of the column is choosen in Teradata for the Data Distribution.
We can get the No of AMP's using this hashamp function.
SEL HASHAMP()+1 -- This will give the No of Amps present in the TD Server.
Thanks & Regards,
SELECT HASHAMP(HASHBUCKET(HASHROW(<PI_COLUMN_LIST>))) AS "AMP#",COUNT(*)
GROUP BY 1
ORDER BY 2 DESC;
I have 2 amps and have a table of 4 columns col1,col2,col3 and col4.
Following is the ouput of hashamp query. As can be seen except col1, other columns distribute row across amps appropriately. But what should be derived for col1?
|Column Name||Amp #||Row count|
Seems the counts are not readable. Following should be helpful.
I would go with Col4, as the data on both the AMPs is distributed equally, 50 & 51.
In your case you can choose either one as the data count is small thus the PI selection won't make too much difference. The PI selection becomes more critical when the data count is huge lets say hundreds of million rows landing on the AMPs.