Database

turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

05-14-2012
07:31 AM

05-14-2012
07:31 AM

Hi,

I need to generate a unique value based on description of the field without using the identity columns the reason being if the same description comes in the next load we will get a different value. I tried using HASH functions but it is giving same values for multiple descriptions.

My TABLE:

FAC_ID - INTEGER

PROC_DESCR - VARCHAR(100)

I need to generate one more column based on PROC_DESCR that will assign a unique code to it. Here is what I tried:

Select FAC_ID,

CAST(CAST(HASHBUCKET(hashrow(PROC_DESCR) (BYTE(4))) AS BIGINT) AS VARCHAR(15)) as PROC_CODE,

PROC_DESCR

FROM MYTABLE ;

But it gives me same PROC_CODE's for different PROC_DESCR which fails my requirement.

10 REPLIES

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

05-14-2012
10:31 AM

05-14-2012
10:31 AM

HASHBUCKET returns a value between 0 and 2**16 (or 2**20, depending on your system).

Use only HASHROW which results in a BYTE(4) or use the approach i use in my stats query (http://developer.teradata.com/`node/9598`):

BYTE(4) to signed integer (INT)

REPLACE FUNCTION Byte4_to_Int (inBytes VARBYTE(4))

RETURNS INT

LANGUAGE SQL

CONTAINS SQL

DETERMINISTIC

RETURNS NULL ON NULL INPUT

SQL SECURITY DEFINER

COLLATION INVOKER

INLINE TYPE 1

RETURN

(HASHBUCKET(inBytes (BYTE(4))) / ((HASHBUCKET()+1)/65536) * 65536. +

HASHBUCKET(SUBSTR(inBytes,3,2) (BYTE(4))) / ((HASHBUCKET()+1)/65536) -

HASHBUCKET(inBytes (BYTE(4))) / ((HASHBUCKET()+1)/65536) / 32768 * 4294967296)

;

BYTE(4) to unsigned integer (BIGINT)

REPLACE FUNCTION Byte4_to_UnsignedInt (inBytes VARBYTE(4))

RETURNS BIGINT

LANGUAGE SQL

CONTAINS SQL

DETERMINISTIC

RETURNS NULL ON NULL INPUT

SQL SECURITY DEFINER

COLLATION INVOKER

INLINE TYPE 1

RETURN

HASHBUCKET(inBytes (BYTE(4))) / ((HASHBUCKET()+1)/65536) * 65536. +

HASHBUCKET(SUBSTR(inBytes,3,2) (BYTE(4))) / ((HASHBUCKET()+1)/65536)

;

If you're not on TD13.10 simply cut & paste the source code and replace inBytes with PROC_DESCR.

Dieter

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

05-14-2012
11:30 PM

05-14-2012
11:30 PM

Thanks a lot, Dieter..!!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

09-19-2012
06:11 AM

09-19-2012
06:11 AM

Hi Dieter,

Can you please help me for my problem.

We have TAX_ID field in one of our tables and we are trying to mask that value and so mask value should be unique per given TAX_ID. So we are trying to use HASHROW(TAX_ID) and its returning unique values but since its in BYTE datatype, end users are unable to use that value. Is there any UDF function available which converts that to CHAR? Or any other ideas you have?

This is pretty urgent and any help is appreciated.

Regards

Ajay

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

09-19-2012
11:21 PM

09-19-2012
11:21 PM

Hi Ajay,

you could use the calculation i posted and cast it to a char.

But be aware, the HashRow function is not guaranteed to return unique values, you might need to implement your own UDF.

Dieter

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

01-16-2013
09:31 PM

01-16-2013
09:31 PM

Hi Dieter,

You mentioned in your previous post that : "If you're not on TD13.10 simply cut & paste the source code and replace inBytes with PROC_DESCR"

We're currently on 13.10 and we would like to use the function:

BYTE(4) to unsigned integer (BIGINT)

REPLACE FUNCTION Byte4_to_UnsignedInt (inBytes VARBYTE(4))

RETURNS BIGINT

LANGUAGE SQL

CONTAINS SQL

DETERMINISTIC

RETURNS NULL ON NULL INPUT

SQL SECURITY DEFINER

COLLATION INVOKER

INLINE TYPE 1

RETURN

HASHBUCKET(inBytes (BYTE(4))) / ((HASHBUCKET()+1)/65536) * 65536. +

HASHBUCKET(SUBSTR(inBytes,3,2) (BYTE(4))) / ((HASHBUCKET()+1)/65536)

;

Can we still use it? If not, can we know what changes we need to make to make it work on 13.10?

Also, I did not entirely understand the logic:

HASHBUCKET(inBytes (BYTE(4))) / ((HASHBUCKET()+1)/65536) * 65536. +

HASHBUCKET(SUBSTR(inBytes,3,2) (BYTE(4))) / ((HASHBUCKET()+1)/65536)

It would be great if you can explain me this logic. I could understand why you divided by ((HASHBUCKET()+1)/65536) (to handle hashbucket being 20 bit now), but I didn't understand why multiply by 65536 and no substr in first part(of addition) and apply substr and not multiply in the other part(of the addition)

-Suhail

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

01-16-2013
11:04 PM

01-16-2013
11:04 PM

Hi Suhail,

of course you can use the function as-is, pre-13.10 just doesn't support SQL functions.

The basic calculation is like getting a year-month out of a year and a month:

year * 100 + month = 2013*100+01 -> 201301

For binary data it's:

aabbccdd -> hashbucket(aabb....) * 2**16 + hashbucket(ccdd....)

Dieter

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

03-11-2014
02:55 PM

03-11-2014
02:55 PM

Hi Dieter,

I tried your solution posted above, and columnA is defined as varchar(50).

The following is the query, but there are few duplicates after converting(below is an example). Can you please take a look to see if any workaround for this?

SELECT DISTINCT columnA as a,

HASHROW( columnA ) as b,

(HASHBUCKET(HASHROW (columnA) (BYTE(4))) / ((HASHBUCKET()+1)/65536) * 65536. +

HASHBUCKET(SUBSTR(HASHROW (columnA),3,2) (BYTE(4))) / ((HASHBUCKET()+1)/65536)) as c

FROM TableA

output c is the same when columnA in ('0034317069', '0063313119882_206148')

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

01-05-2016
02:29 PM

01-05-2016
02:29 PM

Hi Dieter,

Above in this thread you have mentioned that HASHROW function is not guranted to return a unique value. Can you please explain as we want to use HASHROW to generate a hashvalue for a VARCHAR column (ACCOUNT_NUMBER). We anticipate:

1) each time same HAHROW would be the out put for a particular account number i.e. 123 will always be hashed into 27-48-9E-A5.

2) No two account numbers can have the same ROWHASH value.

3) ROWHASH value can not be translated back to input value (ACCOUNT_ NUMBER in this case)

Thanks,

Asif

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

01-05-2016
04:45 PM

01-05-2016
04:45 PM

Assumption #3 is correct - the hash function is not invertible.

Assumption #2 is incorrect. It is possible (though mildly unusual) for two different input values to yield the same hash (a hash collision).

Assumption #1 is only partially correct. On a given system, the same input value should return the same hash. Changing the hash function implementation would require migrating the data to new hardware or doing a full sysinit & restore, so changes would be very infrequent. But there have been several versions of hash function over time.