C/C++ Coded User-Defined Function Scenario

Database
The Teradata Database channel includes discussions around advanced Teradata features such as high-performance parallel database technology, the optimizer, mixed workload management solutions, and other related technologies.
Teradata Employee

C/C++ Coded User-Defined Function Scenario


User-Defined Functions

This feature was introduced in Teradata Database V2R5.1.



Description

Teradata Database includes a wide variety of standard SQL functions that can be used in SQL queries. User-defined functions (UDFs) allow you to create your own functions and add them to Teradata Database to extend its built-in capabilities. You can create new functions in C/C++, Java, or in SQL itself and use them in SQL queries just like Teradata Database built-in functions.

There are three broad categories of UDFs that you can create:

  • Scalar functions accept zero or more input arguments and return a single value result. CURRENT_DATE and TRIM are examples of built-in scalar SQL functions.
  • Aggregate functions accept sets of input argument values (such as all or some of the values in a particular table column) and return a single value result of processing the input values. AVG, MIN, and MAX are examples of built-in aggregate functions.
  • Table functions are invoked in the FROM clause of a SELECT statement, and return a derived table to the calling statement, one row at a time. They can be used for special-purpose processing, such as shredding an XML document into SQL rows in a relational database table.

You can use UDFs for external I/O, complex business calculations, statistical calculations that use aggregate functions, and almost any type of data conversion. UDFs can help you enforce business rules and aid in data transformations.


More...


Benefits

UDFs:

  • Can be written in popular programming languages C/C++, Java, or in SQL itself.
  • Take full advantage of the Teradata Database parallel environment. When the UDF is referenced in a SQL request, a copy of the UDF executes on every AMP vproc.
  • Become part of the Teradata SQL language and are accessible to anyone who has the appropriate user access rights.
  • Can be used nearly any place within SQL expressions that Teradata built-in functions can be used.
  • Can be obtained from third-party vendors as packaged collections for special purposes.


Considerations

  • UDFs should be thoroughly tested and debugged prior to adding them to Teradata Database.
  • If the function requires access to specific operating system resources that ordinary OS users would not be able to access, you must use CREATE/REPLACE AUTHORIZATION to create a context that allows the UDF to perform I/O by running as a separate process under the authorization of that OS user.
  • If the UDF implements data transformations, ordering, or cast functionality for a user-defined data type (UDT), you must use appropriate DDL to register the UDF in TD (CREATE or REPLACE statements for TRANSFORM, ORDERING, or CAST).


Performance Implications of Protected and Unprotected Mode

By default, all UDFs are created to run in protected mode. In protected mode, each instance of the UDF runs in a separate process. This protects Teradata Database from programming errors that could cause the database to crash and restart, however, protected mode places additional processing burdens on the system that often result in performance degradation. Protected mode should be used only for UDF development and testing. The best practice is to develop and test your UDFs on a non-production test system.

In unprotected mode, a UDF is called directly by Teradata Database rather than running as a separate process. UDFs run much faster and place less of a performance burden on Teradata Database, but this mode should be used only for UDFs that have undergone rigorous testing. You must explicitly use ALTER FUNCTION to change a UDF from protected to unprotected mode.

Note: Java UDFs always run in protected mode. Therefore, they are slower than equivalent C or C++ UDFs.






Benefits and Considerations for C/C++ User-Defined Functions



Benefits of C/C++ UDFs

  • You can write complex functions using all the richness of the C/C++ programming language, and have those functions treated as if they were native Teradata Database functions.
  • You can build and distribute C/C++ UDFs without revealing the source code to users or purchasers.


Considerations for C/C++ UDFs

  • If you are providing C/C++ source code for your UDF, there must be a C/C++ compiler on all TD system nodes that have PEs. If your Teradata Database system runs on Linux nodes, the gcc compiler is already included on every node.
  • You should write, fully test, and debug your C/C++ code in an appropriate C/C++ code development environment.
  • The file sqltypes_td.h needs to be included in your C/C++ development environment for definition of the SQL data types and UDF interface APIs. For example, when using an IDE such as Microsoft Visual Studio, you must specify the path to sqltypes_td.h as an additional include directory.
  • The CREATE FUNCTION statement that creates and registers the function in Teradata Database must specify where the source or object code for the function exists. This is accomplished by means of the EXTERNAL clause of the CREATE FUNCTION statement.




About the EXTERNAL Clause of CREATE FUNCTION


The EXTERNAL clause of CREATE FUNCTION declares that the function originates external to Teradata Database, and optionally identifies the location of components required to compile and run the function. In most cases, you will be creating the source code for a single function on a client system and allowing Teradata Database to copy the source code to a Teradata system and compile the source there at the time the UDF is created.

Used without options, EXTERNAL causes Teradata Database to assume the C source code is in the current working directory of your client, and that the name of the .c source code file matches that of the function in the C source file, and matches the name of the UDF defined by the CREATE FUNCTION function_name statement. In this case, Teradata Database also assumes that no special header or include files are required to compile the C source.

You can use the EXTERNAL NAME clause to specify more details, such as the name and location of C source or object file that comprises the function, and other files required to compile or create the UDF, such as C header files and libraries. The EXTERNAL NAME clause takes a string literal value that uses letter codes to identify specific UDF components. The general format of a parameter entry in the string is one or two letters followed immediately by a delimiter character of your choosing, followed by a file name or location. Several parameters can be specified in the string.

The EXTERNAL NAME string parameters that can be used for C and C++ UDFs are described below. The exclamation mark (!) represents the delimiter character used to separate entries within the string.



CS!name_on_server!source_file_name


SS!name_on_server!source_file_name

Identifies the name of the C source code file. Use either the CS or SS prefix. CS indicates the file is located on the client machine. SS indicates the file is located on the Teradata server.

name_on_server specifies the name given to the source code file when it is copied from its original location to the Teradata server UDF compile directory. This can be any name that is unique in the Teradata Database system, and need not match the name of the source file itself. name_on_server should not include the .c file name extension.

source_file_name specifies the name of the original C source code file, including a file path if the file is not located in the current working directory. source_file_name must include the .c file extension.



Example: 'CS!myudfsource!C:\mycode\myudfsource.c'


CI!name_on_server!include_file_name


SI!name_on_server!include_file_name

Identifies the name of a C header file that is included with an <#include> statement in C source file. CI indicates the file is located on the client machine. SI indicates the file is located on the Teradata server.

name_on_server specifies the name given to the header file when it is copied from its original location to the Teradata server UDF compile directory. This can be any name that is unique in the Teradata Database system, and need not match the name of the header file itself. name_on_server should not include the .h file name extension.

include_file_name specifies the name of the header file, including a file path if the file is not located in the current working directory. include_file_name must include the .h file extension.



Example: 'CS!myudfinclude!C:\mycode\headers\myinclude.h'


CO!name_on_server!object_file_name


SO!name_on_server!object_file_name

Identifies the name of a compiled C object file that implements the function. Use either the CO or SO prefix. CO indicates the file is located on the client machine. SO indicates the file is located on the Teradata server.

name_on_server specifies the name given to the object file when it is copied from its original location to the Teradata server UDF directory. This can be any name that is unique in the Teradata Database system, and need not match the name of the object file itself.

include_file_name specifies the name of the object file, including a file path if the file is not located in the current working directory. Although the object module can be located on either the client or server, the object itself must have been either compiled on the Teradata Database server or on a system that generates an object that is compatible with the server.



Example: 'SO!myudfobject!/usr/jsmith/mycode/bin/myfunction'


SL!library_file_name

Identifies the name of a nonstandard C library that is required to compile the function. Standard libraries do not need to be specified.

library_file_name specifies the name of the nonstandard library file, including a file path if the file is not located in the current working directory.



Example: 'SL!/usr/jsmith/mycode/libs/mylibrary'


SP!package_file_name

Identifies the name of a package that contains the function to be implemented as a UDF.

package_file_name specifies the name of the package file, including a file path if the file is not located in the current working directory.



Example: 'SP!/opt/TDMiner/libTeraMiner.so '


F!function_source_name

Identifies the name of the function to be used for the UDF as that function is defined in the C source code. This parameter is required only if the name of the function defined in the C source file differs from the function_name declared by the CREATE FUNCTION function_name statement.

function_source_name specifies the name of the function defined in the C source file.



Example: 'F!plusudf'



Example

In the following EXTERNAL NAME clause string, the CS specification would indicate that the C source file for the function my_fun is located in the tests directory on the client system. The CI specification indicates that the function requires the inclusion of the header file my_header.h, also located on the client machine in the tests directory. The delimiter character used in this example is the exclamation mark (!).

EXTERNAL NAME 'CS!my_fun!c:\tests\my_fun.c!CI!my_header!c:\tests\my_header.h'






Scenario: Creating a C/C++ User-Defined Function


This scenario creates one simple scalar and one more complex aggregate C coded UDF.

Creating and using C and C++ UDFs involves the following tasks:

The scenario demonstrates the following tasks that are involved in creating and using C and C++ UDFs:

  • Write and debug the C code for your UDF using the C developement environment of your choice.
  • Use the CREATE FUNCTION statement to install and register a new UDF on Teradata Database. This statement identifies the function name, declares the function parameters and return value type, specifies the programming language used to create the function, and identifies the location of the UDF source or object code and other ancillary files required to create the UDF.
    Note: To use the CREATE FUNCTION statement, you must have been granted the CREATE FUNCTION privilege (or the FUNCTION privilege, which grants both CREATE and DROP FUNCTION privileges.

  • Use the UDF in queries just as you would use any built-in TD SQL function.


For More Information About C/C++ UDFs

For more information about C and C++ UDFs, and about the SQL used in these examples see:

Document Description
SQL Data Definition Language - Detailed Topics, B035-1184 Provides detailed information on the CREATE FUNCTION statement.
SQL Data Definition Language - Syntax and Examples, B035-1144 Describes SQL syntax for CREATE FUNCTION.
SQL External Routine Programming, B035-1147 Describes creating C and C++ UDFs.
Orange Book: Teradata Database User Defined Function User's Guide, Document #: 541-0004361 Detailed discussion and application examples of using UDTs.






Example: Create a Simple Scalar C UDF


The following C code creates a simple scalar function to add two integers passed to the function and return the sum.

void plusudf(int *a, int *b, int *result, char sqlstate[5])
{ *result = *a + *b;}

The final parameter, char ssqlstate[5], allows error messages to be returned by Teradata for the UDF.

Assume the C source file is saved on the client machine as C:\myC\UDF\plusudf.c.

The following SQL creates a UDF named plusudf based on the simple C code above.

CREATE FUNCTION plusudf(
a INTEGER,
b INTEGER
)RETURNS INTEGER
LANGUAGE C
NO SQL
EXTERNAL NAME 'CS!plusudf!C:\myC\UDF\plusudf.c'
PARAMETER STYLE TD_GENERAL;

After the UDF has been created, it can be used like any Teradata Database function. In this case, it can even be run by itself from a tool like BTEQ:

SELECT plusudf(5,10);

plusudf(5,10)
-------------
15





Example: Create an Aggregate C UDF


The following C code creates an aggregate function to calculate the standard deviation value from a column of values.

#define SQL_TEXT Latin_Text
#include <sqltypes_td.h>
#include <string.h>
#include <math.h>
typedef struct agr_storage {
FLOAT count;
FLOAT x_sq;
FLOAT x_sum;
} AGR_Storage;

void STD_DEV( FNC_Phase phase,
FNC_Context_t *fctx,
FLOAT *x,
FLOAT *result,
int *x_i,
int *result_i,
char sqlstate[6],
SQL_TEXT fncname[129],
SQL_TEXT sfncname[129],
SQL_TEXT error_message[257] )
{
/* pointers to intermediate storage areas */
AGR_Storage *s1 = fctx->interim1;
AGR_Storage *s2 = fctx->interim2;

/* The standard deviation function described here is: */
/* */
/* s = sqrt(sum(x^2)/N - (sum(x)/N)^2) */
/* */
/* sum(x^2) :> x_sq, N :> count, sum(x) :> x_sum */

/* switch to determine the aggregation phase */
switch (phase)
{

/* This case selection is called once per group and */
/* allocates and initializes all intermediate */
/* aggregate storage */

case AGR_INIT:
/* Get some storage for intermediate aggregate values. */
/* FNC_DefMem returns zero if it cannot get the requested */
/* memory. The amount of storage required is the size of */
/* the structure used for intermediate aggregate values */

s1 = FNC_DefMem(sizeof(AGR_Storage));
if (s1 == NULL)
{
/* could not get storage */
strcpy(sqlstate, "U0001"); /* see SQLSTATE table */
return;
}

/* Initialize the intermediate aggregate values */
s1->count = 0;
s1->x_sq = 0;
s1->x_sum = 0;

/************************************************** */
/* Fall through to detail phase, because the */
/* AGR_INIT call passes in the first set of */
/* values for the group */
/************************************************** */
/* This case selection is called once for each */
/* selected row to aggregate. x is the column the */
/* std_dev is being calculated for. */
/* One copy will be running on each AMP */

case AGR_DETAIL:
if (*x_i != -1)
{
s1->count++;
s1->x_sq += *x * *x;
s1->x_sum += *x;
}
break;

/* This case selection combines the results of ALL */
/* individual AMPs for each group */

case AGR_COMBINE:
s1->count += s2->count;
s1->x_sq += s2->x_sq;
s1->x_sum += s2->x_sum;
break;

/* This case selection returns the final standard */
/* deviation. It is called once for each group. */

case AGR_FINAL:
{
FLOAT term2 = s1->x_sum/s1->count;
FLOAT variance = s1->x_sq/s1->count - term2*term2;
/* Adjust for deviations close to zero */
if (fabs(variance) < 1.0e-14)
variance = 0.0;
*result = sqrt(variance);
break;
}

case AGR_NODATA:
/* return null if no data */
*result_i = -1;
break;

default:
/* If it gets here there must be an error because this */
/* function does not accept any other phase options */
strcpy(sqlstate, "U0005");
}
return;
}

Assume the C source file is saved on the client machine as C:\myC\UDF\std_dev.c.

The following SQL creates a UDF named STD_DEV based on the C code above.

REPLACE FUNCTION SYSLIB.STD_DEV(x FLOAT)
RETURNS FLOAT
CLASS AGGREGATE
LANGUAGE C
NO SQL
PARAMETER STYLE SQL
EXTERNAL NAME 'CS!STD_DEV!C:\myC\UDF\std_dev.c';

After the UDF has been created, it can be used like any Teradata Database function. The Teradata Database BTEQ session shown below creates a table, and loads it with some data. It is followed by another session where the user runs the STD_DEV UDF to calculate the standard deviation of the data in one of the table columns.

.logon testsys/sysdba,sysdba

DATABASE activity_db;

CREATE TABLE PRODUCT_LIFE
(Product_Id SMALLINT,
Product_Class VARCHAR (25),
Hours INTEGER)
;

INSERT INTO PRODUCT_LIFE VALUES (1,'Bulb',2)
;INSERT INTO PRODUCT_LIFE VALUES (2,'Bulb',4)
;INSERT INTO PRODUCT_LIFE VALUES (3,'Bulb',4)
;INSERT INTO PRODUCT_LIFE VALUES (4,'Bulb',4)
;INSERT INTO PRODUCT_LIFE VALUES (5,'Bulb',5)
;INSERT INTO PRODUCT_LIFE VALUES (6,'Bulb',5)
;INSERT INTO PRODUCT_LIFE VALUES (7,'Bulb',7)
;INSERT INTO PRODUCT_LIFE VALUES (8,'Bulb',9);

SELECT AVG(Hours),
STD_DEV(Hours)
FROM PRODUCT_LIFE
WHERE PRODUCT_CLASS = 'Bulb'
;

*** Total elapsed time was 1 second.

Average(Hours) STD_DEV(Hours)
-------------- ----------------------
5 2.00000000000000E 000

.quit



<SCRIPT type="text/javascript"><BR /><BR /> jQuery(function() {<BR /> jQuery("span.notetitle").css({"font-weight":"bold"});<BR /> jQuery("div.note").css({"border":"1px solid grey"});<BR /> jQuery("pre").css({"background-color":"#EEEEEE", "border":"1px solid black"});<BR /> jQuery(".toggle, .toggle-02, .toggle-03, .toggle-04, .toggle-05, .toggle-06").css({"cursor":"pointer"});<BR /> jQuery(".expandme").css({"color":"#0E45FF", <BR /> "text-decoration":"underline", <BR /> "margin-left":"2em"});<BR /> jQuery(".toggle-02, .toggle-03, .toggle-04, .toggle-05, .toggle-06").css( <BR /> {"background-image":"url('https://developer.teradata.com/sites/all/files/plus_arrow.gif')", <BR /> "background-repeat":"no-repeat", <BR /> "background-position":"5px 7px",<BR /> "text-indent":"2em",<BR /> "display":"block"});<BR /> jQuery(".panel, .panel-02, .panel-03, .panel-04, .panel-05, .panel-06").css({"display":"none"});<BR /> jQuery("[class^='toggle-']").append("<div style='color:#0E45FF; text-decoration:underline;'>More...</div>");<BR /> });<BR /> jQuery(function() {<BR /> jQuery(".toggle").click(function(){<BR /> jQuery(".panel").slideToggle("fast"); <BR /> });<BR /> jQuery(".toggle-02").click(function(){<BR /> jQuery(".panel-02").slideToggle("fast", function() {<BR /> jQuery(".toggle-02").css({"background-image":"url('https://developer.teradata.com/sites/all/files/minus_arrow.gif')"});<BR /> }); <BR /> });<BR /> jQuery(".toggle-03").click(function(){<BR /> jQuery(".panel-03").slideToggle("fast", function() {<BR /> jQuery(".toggle-03").css({"background-image":"url('https://developer.teradata.com/sites/all/files/minus_arrow.gif')"});<BR /> }); <BR /> });<BR /> jQuery(".toggle-04").click(function(){<BR /> jQuery(".panel-04").slideToggle("fast", function() {<BR /> jQuery(".toggle-04").css({"background-image":"url('https://developer.teradata.com/sites/all/files/minus_arrow.gif')"});<BR /> }); <BR /> });<BR /> jQuery(".toggle-05").click(function(){<BR /> jQuery(".panel-05").slideToggle("fast", function() {<BR /> jQuery(".toggle-04").css({"background-image":"url('https://developer.teradata.com/sites/all/files/minus_arrow.gif')"});<BR /> }); <BR /> });<BR /> jQuery(".toggle-06").click(function(){<BR /> jQuery(".panel-06").slideToggle("fast", function() {<BR /> jQuery(".toggle-06").css({"background-image":"url('https://developer.teradata.com/sites/all/files/minus_arrow.gif')"});<BR /> }); <BR /> });<BR /> });<BR /></SCRIPT>