Same hash for different values

General
Enthusiast

Same hash for different values


Hi All,

I tried to see hasrow for 2 sets of data, almost same data except for few values,I was expecting it to return different hashrows as all the values are not same but i got same hashrow for both the queries.

I provided both of them below, It will be helpful if I can get some explanation on why it is behaving so...or the way which i have tried is proper or not?

sel  HASHROW( 

'2013-04-16'

,'9999-12-31'

,0.000

,0.000

,0.000

,0.000

,' '

,'Y'

,1465131

,'Y'

,0.00

,'Y'

,'N'

);

***************

Result:

91-2A-42-24

**************

 sel

 hashrow(

'2013-04-16'

 ,'9999-12-31'

 ,0.000

 ,0.000

 ,0.000

 ,0.000

 ,' '

 ,'N' --different value

 ,1465131

 ,'Y'

 ,0.00

 ,'Y'

 ,'Y'  --different value

  )

***********

Result:

91-2A-42-24

Thanks in advance..

3 REPLIES
Senior Supporter

Re: Same hash for different values

The hashrow has to few bits, so hash collissions are common. It doesn't have the same properties as a SHA1 hash.

Teradata Employee

Re: Same hash for different values

This actually demonstrates an interesting property of the Teradata hash function. Providing the same values in a different order will result in the same hash - on purpose. This makes it possible for us to ensure that when multiple columns are hashed together that they will hash the same regardless of the column order and thus will hash to the same AMP and will join appropriately.

There is no expectation or intention that the Teradata hash function will be uniqueness preserving (or order preserving). That is not an intent of the design.

Enthusiast

Re: Same hash for different values

Also what you are trying to do is not going to provide any real value.

You are using constants instead of true column values (whichtypically  have different data types and will therefore be hashed differently).

Ie character '2013-04-16' will hash differently than date '2013-04-16'

--Shelley