Data Distribution Differs with Primary Index Columns and Index Name?

Database

Data Distribution Differs with Primary Index Columns and Index Name?

Hi All

 

I was checking the Data Distribution with the columns included in Primary Index and Primary Index Name given.

Included the queries and Output for better understanding.  Please help me in understanding, why we see a difference between the AMP Count when the Row count is Same.

 

SELECT HASHAMP(HASHBUCKET(HASHROW('Primary_index_name'))) AS
"AMP#",COUNT(*)
FROM Database_name.table_name
GROUP BY 1
ORDER BY 2 DESC;
Output: AMP Count - 173, Row count - 25 million

 

SELECT HASHAMP(HASHBUCKET(HASHROW('column1,column2,column3,column4,column5,column6,column7,column8,column9,column10,column11'))) AS
"AMP#",COUNT(*)
FROM Database_name.table_name
GROUP BY 1
ORDER BY 2 DESC;
Output : AMP Count - 57, Row Count - 25 million

 

 

Thanks

Rajasekkhar

2 REPLIES
Senior Apprentice

Re: Data Distribution Differs with Primary Index Columns and Index Name?

#1 will not work as expected because the optimizer doesn't recognize ('Primary_index_name') as the name of an index name, it's just a string.

 

#2 will also not work as-is, because you don't supply a list of columns, but a string again. Remove the quotes:

(column1,column2,column3,column4,column5,column6,column7,column8,column9,column10,column11)

Re: Data Distribution Differs with Primary Index Columns and Index Name?

Hi Dieter

 

Thanks for the clarification, Yes, now I can see the Data Distribution for the #2, and for #1 realized ('Primary_index_name') doesn't work to check the Distribution.

 

Thank you!!