Efficiency UDF scalar function vs.conventional SQL

Connectivity

Efficiency UDF scalar function vs.conventional SQL

To test an attribute in numeric I can use the function
is_integer from Mr. Dennis Calkins (Thanks here for Mr. Calkins)
at
http://www.teradataforum.com/is_integer.htm

That is a very smart and concise mode.

for instance:

SELECT 123495
.....
, (
(CASE WHEN myDB.is_integer(column1)=1
THEN 'do this '
ELSE 'do that'
END)

Another way is with the SUBSTRING statement(i.e. for one 18-digit Attribute)

example:

SELECT
123495
.......
, ((CASE WHEN SUBSTRING(column1 FROM 1 FOR 1) IN ('0','1','2','3','4','5','6','7','8','9','+','-')
AND SUBSTRING(column1 FROM 2 FOR 1) IN ('0','1','2','3','4','5','6','7','8','9','+','-')
................
AND SUBSTRING(column1 FROM 18 FOR 1) IN ('0','1','2','3','4','5','6','7','8','9','+','-')
THEN 'do this '
ELSE 'do that'
END)

That makes the script large and hard to read.

The problem is that the SUBSTRING statement is over 10 times
faster as the is_numeric function.

For the appropriate attributes I made COLLECT STATISTICS and
I altered the function in ALTER FUNCTION is_integer EXECUTE NOT PROTECTED;

Config. of the System is:
Teradata RELEASE V2R.05.01
MP-RAS, Unix 3.03, Typ 5350, 3 Nodes, 30AMPs, cpu Pentium IV, 2800 MHz

The Table has many millions records

is there a better/faster way of doing this with the UDF-function?
Does any one have expirience how to make UDF scalar functions efficiently ?

Any help and assistance is greatly appreciated.
6 REPLIES
N/A

Re: Efficiency UDF scalar function vs.conventional SQL

I had a similar problem in testing for a numeric value, and
couldn't get any performance out of any procedural udf style function.

This was most efficient/clearest thing i could come up with:

Ok, so given a varchar(10) field called "number_in_a_char_field":

(case when number_in_a_char_field between '0000000000' and '9999999999' then
'do the number thing'
else
'not a number'
end) as an_example

...food for thought, any suggestions welcome.

Re: Efficiency UDF scalar function vs.conventional SQL

When you executed the UDF, were you in protected or unprotected mode? Protected mode is safer when you're first testing the UDF, but it is much slower. You have to ALTER the function to put it in unprotected mode. If you were not executing in unprotected mode, I would try that and see how the performance compares.

Also, the suggestion from Womper will not work if the non-numeric character is anywhere past the first character. For instance, '1abcdefghi' is between '0000000000' and '9999999999', but it is not a number.

Good luck!

Re: Efficiency UDF scalar function vs.conventional SQL

Thanks A lot for the response.
I had already altered the UDF in unprotected mode
befor calling and benchmark it.
But it is running under Teradata V2R5.1.
I guess that the Teradata V2R6.1 come out
withe a better performance
concerning UDFs.

Any experience with Teradata V2R6.1 and UDF?

Thanks
stami

Re: Efficiency UDF scalar function vs.conventional SQL

I don't use the is_number function, but I use a variation of the to_number function on v2r6.0. I just ran a benchmark against 8 million rows and am seeing that the UDF in unprotected mode runs faster and uses about 1/4 of the CPU of the SUBSTRING code.
N/A

Re: Efficiency UDF scalar function vs.conventional SQL

How bout this one...

select *
from
(
select '1abcdefgha' as test_value
from dbc.tables where tablename = 'tables' and databasename = 'dbc'

union all
select '0000000000' as test_value
from dbc.tables where tablename = 'tables' and databasename = 'dbc'
union all
select '9999999999' as test_value
from dbc.tables where tablename = 'tables' and databasename = 'dbc'
) t1
where
test_value between '0000000000' and '9999999999'
and substr(test_value,9,1) between '0' and '9'

Any exceptions to that one? Maybe we have to check on length() also...
N/A

Re: Efficiency UDF scalar function vs.conventional SQL

nope... found my own issue.

1abcdefg1a

any ideas on sql solutions?