Developing Database Extensions (UDFs etc)

Extensibility
Extensibility covers the mechanisms by which you, as the user or developer, can extend the functionality of the Teradata Database, for example with the use of User Defined Functions, or UDFs.
Teradata Employee

Developing Database Extensions (UDFs etc)

From time to time I’m asked how one might go about writing an UDF or an INMod or some other procedural extension to the database. The question isn’t a “where do I learn C or C++” question; rather it is a how do I go about debugging and testing my extension without the overheads and constraints of running within the database (or utility).

My response is you can use “my Teradata” for which I have the source code (attached). This will allow you to run, test and debug your function outside of Teradata. In short with “my Teradata” all of the features of your IDE will be available to you, including your debugger.

There are a number of documents that describe the structure and interfaces for UDF’s, INMODs and other forms of database extensions. However, I’ve not seen any documents that describe how to leverage powerful debugging features found in all modern IDE’s. The UDF manual suggests “The best practice is to develop and test the function outside the database before you install it. You can use your own debugging tools to verify functionality.”, but provides no examples as to how to go about doing that. In this article I present “my Teradata” to enable you to do this.

My Teradata.c

Unfortunately “my Teradata” doesn’t include all of the features of real Teradata. In fact, it doesn’t include any of the features of real Teradata except the ability to call a UDF (or INMOD etc). That’s right fellow Teradatians; “my Teradata” is simply a standalone test harness.

A UDF, INMOD or any other C or C++ based database extension is a regular C/C++ function that can be called just like any other C function or C++ method. “My Teradata” leverages this by defining a main method which calls the UDF (or INMOD) under development. From here on I will simply call this “extension” or “database extension” unless referring to a specific type of extension.

By doing this (adding a main method) you end up with a program that can run your extension within your IDE (i.e. outside the database). As such the full debugging capabilities of your IDE become available to you.

Structuring the Extension

Whenever I create a database extension, I break it into two parts. These are:

  • Business functions – the function of the extension
  • Interface function(s) – the entry point(s) that Teradata calls to invoke your extension.

Typically I give the main business function a name that reflects the function. In the case of a UDF, I name the interface function using the same name but with the prefix “udf”. For example if my main business function was called xyz, I would name my entry point udfXyz. By doing this, it makes it a little easier to track the top level functions in my extension, especially if the extension is complex. Of course if you are building an extension to a utility (e.g. an INMOD) you are somewhat more limited in your choice of entry point names (it has to be called _dynamn).

In this article, we will build a mind bogglingly useless scalar UDF that add’s two numbers together (the article is about the process, not the extension). The “business function” will be called add and take two parameters (the two numbers to add). It will return the sum of the two numbers. The interface method will thus be called udfAdd.

There are 2 major reasons why I separate the UDF function into business and interface functions. Basically these are because it allows me to reuse existing code and target test cases.

From the reuse angle the interface logic is largely copy and paste from template code. Each interface function will differ according to the number and types of parameters, but the structure is pretty much the same in each case. Similarly the function I want to implement is likely already available from a previous project. Thus it makes sense to copy and paste the business function as is and simply call it from the interface function. Even if the business logic isn’t available from another project and must be built from scratch, it still makes sense to build it as a separate function just in case I want to use it elsewhere and to support testing.

From the testing angle, I can now establish two types of test cases. These are: 

  • Test cases that explore the range of possible inputs and outputs including null values and error conditions (i.e. test the interface) and
  • Test cases that validate the operation of the function itself (i.e. does my extension correctly add the two numbers).

     

The add UDF

First let’s examine the parts of the UDF. The UDF source can be found in the attachment (add.c). This includes both the “business logic” and the “interface logic”.

The business logic

As you might imagine the “business logic” for our mind bogglingly useless UDF is pretty straight forward. So let’s get it out of the way.

    int add (int x, int y) {
return x + y;
}

As one would expect, the above miracle of technological prowess takes two numbers, adds them together and returns the result.

The interface logic

The “interface logic” is slightly more interesting. For illustration purposes we will support the handling of null values in our UDF. The alternative is to get Teradata to handle nulls using syntax in the create function query. If we did this, Teradata wouldn’t even bother calling our UDF for null values. Thus this UDF is a “PARAMETER STYLE SQL” UDF. Note that the PARAMETER STYLE clause in our create query influences the signature of the UDF entry point. Refer to the UDF manual for more on this.

The entry point for my UDF is defined as follows:

void udfAdd (INTEGER *x, INTEGER * y,
INTEGER *result,
int *inputXNullInd, int * inputYNullInd,
int *resultNullInd,
char sqlstate [6],
SQL_TEXT *function_name,
SQL_TEXT *specific_function_name,
SQL_TEXT *error_message)
{
// Function body goes here.
}

If you are unfamiliar with UDF entry point signatures, refer to the appropriate Teradata manual. But briefly, the parameters are: 

  • The arguments supplied to the function (x & y),
  • A place holder for the result (result),  
  • Indicators as to whether or not the inputs and result is/are null (inputXNullInd, inputYNullInd & resultNullInd)
  • Some additional values to tell you about how the function was called and to allow return of success, warning or error information.

Note that all of the input parameters are pointers of some sort. A common mistake is to treat the input parameters (e.g. x & y) as though they have been passed by value – this is not the case. All parameters to the UDF are passed by reference. Making this mistake will generally not result in a desirable outcome!

Note also the “result” of the UDF is not returned through the function; which is declared as void. The result together with any additional information such as an error code or null indicator is passed back from the UDF via one or more of the entry point’s parameters.

Raising an error

Following is the body of the “business logic” for out add UDF. The first part checks the input parameters for null and returns an error if either of both of them is/are null:

if (*inputXNullInd == TD_NULL ||
*inputYNullInd == TD_NULL) { // Null input?
strcpy ((char *) sqlstate, "12345");
strcpy ((char *) error_message, "Null is not allowed");
*resultNullInd = TD_NULL; // gets Null output
return;
}

If either of the input parameters is null, then an error is generated. In this case, the error code is “12345” is returned. There are rules about the structure of the sqlstate (i.e. the error code) which you should lookup in the manual (my sqlstate code does not conform to the rules as I simply made up the 12345 value). An error message is also returned from the UDF. The sqlstate and error message will be returned to the end user query tool (e.g. SQL Assistant, bteq etc) and hopefully handled meaningfully (SQL Assistant will display it in the status bar and record it in the history).

Returning a null result

Some people may argue that generating an error on null input may be a bit drastic (especially for this example). If an error condition is generated, it is treated like any other SQL Error condition (e.g. “duplicate row error”, “insufficient perm space in database X” etc). Specifically when an error is returned the entire transaction is rolled back.

One alternative to generating an error is to return a null value as the result. The following block performs this function:

if (*inputXNullInd == TD_NULL ||
*inputYNullInd == TD_NULL) { // Null input?
*resultNullInd = TD_NULL; // gets Null output
return;
}

In the above logic, if either of the input parameters is null, the result Null indicator (resultNullInd) is set and the function returns. In this case the output of the UDF is a NULL value which is returned to the query that invoked the UDF. If the result null indicator is set, any value in the result parameter will be ignored. With this model, the query completes normally (notwithstanding any other problems that may arise). Any NULL inputs would result in the result set generated by your query containing NULL values.

Treatment of null inputs

Obviously the two treatments of null shown above are mutually exclusive. When you build your database extension, you must define how you will treat null input values and build this into the “interface logic”.

I’ve included both treatments to illustrate how you might use both. You may elect to implement the add function in such a way that the first parameter (x) may not be null and generates an error if it is null, but the second (y) may be null in which case the result will be null. This may sound like a silly suggestion; probably because it is. However, remember, the focus of this article is how to go about building and debugging a UDF, it is not about building a sensible UDF!

Invoking the business logic

The final part of the interface function is to invoke the “business function”. This is achieved with the following:

*result = add (*x, *y);

In the above, the “business function” is called and passed the two input values. The value returned from the add function is placed into the de-referenced result pointer.

Remember that the input parameters are pointers to the user values (passed by reference) not the actual user values (passed by value). Put simply you must dereference the input parameters to get the input values.

The test harness

So far we have built the UDF. The final piece to the puzzle is the test harness “my Teradata.c”. The test harness is included in the attachment (addTest.c).

The main function

Following is the main function. Note that there are three different types of test calls. The first two groups test the “business function” the final group tests the “interface function” which also indirectly tests the “business function”.

main () {
test (1, 2); // Test 1 + 2
test (-1, -2); // Test -1 + -2

// Assert that 2 + 1 = 3
testAssert (3, 2, 1);
// Assert that 2 + 1 = 2 (produces an error)
testAssert (2, 2, 1);

// Test 4 + 5
tdTest (4, TD_NOT_NULL, 5, TD_NOT_NULL);
// Test NULL + 5
tdTest (4, TD_NULL, 5, TD_NOT_NULL);
exit (0);
}

For the first two groups of test cases, we are interested in testing the business logic. In this case the business logic is trivial, so separating the “business tests” and the “interface tests” may seem pointless. However, for larger more complex UDF’s there will likely be many more “business logic” test cases than “interface logic” test cases. Separating the two allows you to concentrate on just the “business logic” without having to wade through the handling of null input values. Hopefully this statement will become clearer once we view the two test functions. Of course this will become your test harness, and you are free to test your extension anyway you choose.

Business logic test cases

The following function simply invokes the business function (add) and prints the result.

void test (int x, int y) {
int result;
result = add(x, y);
printf ("add(%d, %d)=%d\n", x, y, add (x, y));
}

It will (hopefully) produce output similar to the following:

add (1, 2)=3

How you code this function is entirely up to you. If I have a lot of test cases, I will also code it to accept an extra parameter being the expected result. The idea here is for the test function to assert the expected result against the actual result and if there is a difference highlight this with an appropriate message. In this case the test function looks like this:

void testAssert (int expected, int x, int y) {
int result;
result = add(x, y);
printf ("add(%d, %d)=%d (expect %d) %s\n", x, y, result, expected,
result == expected ? "" :
"******* Error, result not expected"
);
}

The enhanced test method shown above would be invoked using something like the following:

    testAssert (3, 1, 2);
testAssert (3, 1, 1);

The second invocation would trigger the message “******* Error, result not expected” because the expected value (3) does not equal the actual result (1 + 1 = 2).

It probably doesn’t make much sense to use both types of test functions; your choice will depend upon the type of extension you are building. For example testing a table function or an INMOD, the first type (test without the assertion) will likely be easier and more efficient. For scalar and aggregate UDF’s, the second type (testAssert) would be more efficient.

Interface logic test cases

The interface logic test cases are oriented to test the handling of nulls, error conditions are raised correctly and a few other things relating to the Teradata to UDF interface.

This function is much larger than the others so I’ve omitted it for brevity. You can find the interface logic test function in the attachment (addTest.c).

Since this function involves testing null inputs, it requires extra parameters when called. An example of invoking the interface logic follows.

    tdTest (4, TD_NOT_NULL, 5, TD_NOT_NULL);
tdTest (4, TD_NULL, 5, TD_NOT_NULL);

In the first of the above cases, I’m testing 4 + 5 and neither parameter is null. In the second case, I’m testing NULL + 5. For this second test case, the first parameter (4) is irrelevant, but the ‘C’ language rules require that it be present.

Basically the interface function test cases will output the parameters and results with null indicators. In addition to this, it will detect if the UDF has attempted to alter any of the input parameters. If it has, then it will output a warning message.

Conclusions

Anyone who has attempted to track down a dangling pointer, out of bounds index or any other subtle bug will appreciate the ability to use a debugger over embedded print statements. Within the database debugging is essentially limited to embedded print statements that wind up in a trace table (refer to the UDF manual for details).

Use of the simple test harness presented here easily enables the use of a fully fledged debugger and should help you track down problems more quickly. Even if I am not tracking down a particular problem, but merely wanting to test that my function works, I’ve found clicking “run” in my IDE and viewing the results is vastly more productive compared to the alternative. The alternative is of course a matter of creating the UDF in the database, running some queries to exercise it then querying the log table to see what happened. The exercise is repeated until testing is complete, while all the time hoping that there isn’t some sort of exception encoded in the UDF which will bring the system to its knees (you do test your UDF’s in protected mode don’t you?!).

Obviously the standard disclaimer applies; specifically you shouldn’t just use this test harness and decide that your function works because it passes all your tests. You must test your extension in the database or database utility, ideally with the same test cases used in the test harness. This test harness merely enables you to simplify testing and leverage the powerful debugging features in your IDE.

Tags (2)
20 REPLIES
FAQ
N/A

Re: Developing Database Extensions (UDFs etc)

For scalar UDFs one simple way could be to just write a complete C program with the UDF defined as a function; and call the UDF from the main function, this way you can even write a loop around the UDF to stress test the code for different possible values.
FAQ
N/A

Re: Developing Database Extensions (UDFs etc)

What can be a simple way to test an aggregate function?

Re: Developing Database Extensions (UDFs etc)

I have a UDF TO_TIMESTAMP (basicly the TO_DATE UDF that is available on ths site in the download package "Teradata UDFs for popular Oracle functions" ).
It returns a TIMESTAMP datatype, and it works very well in queries, thank you.
But when I want to insert the result into a timestamp(0) field, I run into problems:
I get a "5404: Datetime field overflow" error.

OK, I've seen this before: a cast to timestamp(0) seems in order.
But then I get error "7454: DateTime field overflow" ...

I tried the ANSI cast and even the Teradata cast syntax - same result..

I even modified my UDF creation statement, using "RETURNS TIMESTAMP(0) CAST FROM TIMESTAMP".
Now a every call of my UDF returns error "7454: DateTime field overflow".

I'm out of inspiration now.
The only thing that seems to work is keeping the max precision of timestam (6) everywere : in the UDF's return datatype and the target database column ...

Is there a solution for working with TIMESTAMP(0) ?

N/A

Re: Developing Database Extensions (UDFs etc)

Hello Everyone,

I am using the demo version and encoutering the following error:

*** Failure 5600 Error creating UDF/XSP/UDM/UDT: no compiler available.
Statement# 1, Info =0

I have installed the VC++ 2008 version but still facing the same error. Please help me in fixing the issue.

Thanks-
Teradata Employee

Re: Developing Database Extensions (UDFs etc)

Very shortly (coming weeks), the Developer Exchange discussion forums will be coming online. Until then, these questions are best asked at the existing Teradata "Drivers and UDFs" discussion forum (url below). Bear in mind, these will be migrated to DevX shortly.

http://www.teradata.com/teradataforum/Forum11-1.aspx
Teradata Employee

Re: Developing Database Extensions (UDFs etc)

You should double check that your version of Teradata recognises your compiler.
THere is a registry update you could try.

However, I CAUTION YOU WITH MAXIMUM CAUTIONING, not to update your registry without first:
A) Taking a backup
ii) Understanding what this change will do
3) writing down what the original settings in the registry were (so you can just restore just this part later) and
$) Reverting immediately if this doesn't work.
Remember if you update your registry and mess it up, you at risk of breaking your computer.
Also, I have no idea if this will work, as my compiler is obviously working correctly. So there are absolutely no guarantees in this.

You will also need to understand what this update is doing and modify it to match your compiler paths.
Another approach is to try to download the Microsoft 2003 redistributable compiler (I believe you can get this from Microsoft, but I'm not entirely sure about that).
You could also post a message to the forum and/or if you have a support arrangement log a service request.

To try the registry update, save the following to a .reg file (e.g. tdcompiler.reg)
Modify the paths (InstallDir= andProductDir=) so that they point to your compiler directories. Be carefull not to break the quotation marks and not the double \ for the path seperator.
ProductDir contains the compiler (e.g. cl.exe). InstallDir contains miscelaneous files (e.g. msenv.dll, vssln.dll and other dlls).
Remember there are no guarantees and be very very careful. The first line of the file is "Windows Registry Editer Version 5.00". Here is the registry file:

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\VisualStudio\7.1]
"InstallDir"="C:\\Program Files\\Microsoft Visual Studio .NET 2003\\Common7\\IDE\\"

[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\VisualStudio\7.1\Setup\VC]
"ProductDir"="C:\\Program Files\\Microsoft Visual Studio .NET 2003\\Vc7\\"
Teradata Employee

Re: Developing Database Extensions (UDFs etc)

In relation to the aggregate function question, the principle is the same, although you make several calls, one for AGR_INIT, multiple AGR_DETAIL calls etc.

I'm currently working on Part 3 of working with Large Objects. Once I'm done with that and assuming my day job doesn't get in the way I will post another article on Aggregate functions and Table Functions (if there is interest).
N/A

Re: Developing Database Extensions (UDFs etc)

Ok, first off let me start by saying that I haven't touched C code in probably 10 years (and even then I wasn't what I would call a stellar coder). I am taking the addtest.c code and trying to compile it, but it is complaining that the reference to add is undefined. I am assuming that in the addtest.c that I need to put in the actual logic and function for add, correct? Or am I missing something (which is probably the case). I figure if I can get the harness working then I can at least start playing around with the C code to refresh my memories (probably of how much I hate coding) and right some rudamentry udf's to learn by.
Teradata Employee

Re: Developing Database Extensions (UDFs etc)

One of the beauties (and some might argue the negatives) of C is that there are a million and one ways to do even the simplest thing.

Since you didn't supply the command that you used, I'm going to guess that you entered a command like this:
cc [various options] addTest.c

When I do that I get two unresolved references (_add and _udfAdd).

As you summise this doesn't work because addTest.c doesn't contain the add function (nor udfAdd). Both of these functions are in add.c
So what you need to do is (only) compile each file (not compile and link which the above command does), then link the intermediate files (add.o and addTest.o) into addTest[.exe]
Usually the easiest way to do this is to specify both source files on the command line as in:
cc addTest.c add.c

Different platforms and different compilers take different options, so I can't give you the exact commands.
If you are using visual studio (or some other IDE - I use NetBeans for Java, C and other stuff) it might be just as simple as adding both of the .c files to your project (certainly this is true for NetBeans).

I hope the above helps. If not, post another note including what you have tried and we will see where we can go from there.