Wide character functions (UNICODE) in UDF

Connectivity

Wide character functions (UNICODE) in UDF

Does anyone know which wide character functions are available within UDFs written in C?

Also, does anyone know if there are variables passed that indicate the characterset of varchars passed to a function? Since characterset does not trigger function overrides, would be useful to have this information without the user being forced to pass it as a param.
2 REPLIES
rgs
N/A

Re: Wide character functions (UNICODE) in UDF

I assume you mean Unicode since that is the only wide characters set. In Teradata it is a 2 byte character. On MPRAS the native Unicode character is 4 bytes. That is, the C library wide character functions work on 4 byte Unicode strings only. So you can’t use the C library wide character routines directly. You either have to role your own or convert the 2 byte input string to 4 byte character strings and then use the C library wide character functions to manipulate it and vice versa for a return argument. On Windows the native Unicode character is 2 bytes so you can use the C library wide character routines directly. All other character sets are single byte or use a shift out/in sequence to increase the character set.

When your create a UDF you specify with the string parameter in what character set the string argument is to be passed in, just like you would for a column in a table. When the function is called, the string argument, if a different character set from the UDF specification gets converted to the character set for that parameter. The UDF code never has to deal with a character set other than the one it was designed to operate with. Of course if your UDF defined the parameter to use the Latin character set and you pass it a Unicode column string and the string contains characters outside the Latin character set range you will get a translation error and the request ends.

Re: Wide character functions (UNICODE) in UDF

ah yes...quite so - forgot when asking this that the SQL that creates the UDF defines the characterset as LATIN vs UNICODE chars.. I was hoping to right a single UDF that could be made aware internally what characterset was being passed to it - and process it correct - since overloading of the UDF does not take into account characterset...but initial definition locks it down - so its a moot point I guess.