How to Create and Debug a Table CUDF using Eclipse

Extensibility
Extensibility covers the mechanisms by which you, as the user or developer, can extend the functionality of the Teradata Database, for example with the use of User Defined Functions, or UDFs.
Teradata Employee

How to Create and Debug a Table CUDF using Eclipse

SQL provides a set of useful functions, but they might not satisfy all of the particular requirements you have to process your data.

User-defined functions (UDFs) allow you to extend SQL by writing your own functions in the C/C++ programming languages, installing them on the database, and then using them like standard SQL functions.

Teradata Database supports three types of CUDFs (C/C++ User-Defined Functions). They are scalar UDF, aggregate UDF and window aggregate UDF, and table UDF. 

This article will show how to create a simple table CUDF, execute it in protected mode, and debug it locally or remotely using Eclipse CDT. The UDF runs under the default operating system user "tdatuser", and can access the system resources for which tdatuser has privileges. Local Debugging means Eclipse and DBS are running on the same machine. Remote Debugging means Eclipse and DBS are running on separate machines.

In this example, the table CUDF takes in sales data for all stores in raw format, and returns a table of customers and sale items for a given store id.

Prerequisite for this Article

If you have not work through the guide How to Set up a UDF / XSP Debugging Environment, please do it now before you continue.

Article Outline

In this article, we will do the following things to show how to create and debug a table CUDF using Eclipse CDT.

  1. Create C Project in Eclipse
  2. Create CUDF Source File in C Project
  3. Install DDL in Database
  4. Run CUDF in Database
  5. Debug CUDF in Eclipse

1. Create C Project in Eclipse

1.1. C/C++ Perspective

Select the pull-down menu Window -> Open Perspective -> Other ...

An Open Perspective window is displayed. Select C/C++, and click the OK button.

1.2. Create C Project

Go to the Project Explorer in Eclipse and right click New -> C Project.

C Project window comes up.

Input the Project name cudf_table, select the Project type Shared Library, and select the Toolchains Linux GCC. Then click the Finish button.

A new C project cudf_table will be created.

1.3. Change Build Settings

1.3.1. Show Build Settings

Right click the project, and select Properties.

The next window shows the default Properties for cudf_table.

Click C/C++ Build -> Settings. The right of window shows default C/C++ build settings.

You can move the scrollbars in the right or bottom to see the part that is out of view.

1.3.2. Change Compiler Settings

Select GCC C Compiler -> Includes. Click the + button. An Add directory path window shows up. Enter Directory /usr/tdbms/etc.

The /usr/tdbms/etc path will be added into the right list.

Select Miscellaneous in GCC C Compiler. Check the flag Position Independent Code (-fPIC).

1.3.3. Change Linker Settings

Select GCC C Linker -> Libraries. Click the + button as below to add Library search path.

An Add directory path window shows up. Enter /usr/tdbms/lib in the text box and click OK.

The /usr/tdbms/lib path will be shown in the lower list.

Click the OK button to save the settings.

2. Create CUDF Source File in C Project

2.1. Create Source Folder

Right click the project, and select New -> Source Folder.

New Source Folder window comes up. Please enter the source folder name src and click the Finish button.

The src source folder will be added to the project.

2.2. Create New Source File

Right click the src folder, select New -> Source File.

New Source File window is displayed. Enter Source file extract_field.c. Click the Finish button.

A new source file extract_field.c will be created. 

2.3. Edit The Source File

Edit the source file as shown below.

/*
* extract_field.c
*
* This example shows the C code for a table function that supports both TBL_MODE_CONST
* and the TBL_MODE_VARY modes. The TBL_MODE_CONST is very simple in that any one
* AMP will extract the data out of the text string that is passed in.
* This example extracts some data out of a 'raw' text field to generate a bunch of rows. The raw
* data is supposed to be of the form of numbers as follows:
* <store number>,<entries>:<customer ID>,<item ID>, ...; repeat;
* where:
* <store number> is the store that sold these items to customers
* <entries> is the number of items that were sold
* <customer ID>,<item ID> are the tuples repeated <entries> times ending with a ';' where the whole pattern can repeat.

* Created on: Oct 10, 2012
* Author: root
*/

#define SQL_TEXT Latin_Text
#include <sqltypes_td.h>
#include <string.h>
/*************************************/
/* The definition of the scratch pad */
/*************************************/
typedef struct {
int custid;
int itemid;
} item_t;
typedef struct {
int Num_Items;
int Cur_Item;
INTEGER store_num;
item_t *Item_List;
} local_ctx;
/********************************************************************/
/* This copy to SQL_TEXT fields will work to copy ASCII strings to */
/* SQL_TEXT strings (in this case error_message) for any character */
/* set mode. In other words if the SQL_TEXT is defined as */
/* Unicode_Text it will still work to give the proper error message */
/********************************************************************/
static void unicpy(SQL_TEXT *dest, char *src) {
while (*src)
*dest++ = *src++;
}
/********************************************************************/
/* A simple function to scan to the next break in the text based on */
/* the delimiter passed in */
/********************************************************************/
static VARCHAR_LATIN *next(VARCHAR_LATIN find, VARCHAR_LATIN *data) {
while (*data != '\0') {

if (*data == find)
break;
data++;
}
return data;
}
/*********************************************************************/
/* The text data that this function processes is in a very simple */
/* format: */
/* */
/* <storenum>,<num items>:<customer id>,<item number>, ... ; */
/* <storenum>,<num items>: ... */
/*********************************************************************/
/*********************************************************************/
/* Do a pre-scan of the text and save the data. Note: This pre-scan */
/* routine actually extracts all needed data out of the text field */
/* and saves it in allocated memory via FNC_malloc. With this logic */
/* when it gets to the TBL_BUILD phase the data will simply be taken */
/* from the saved area. There is no need to look at the original */
/* string again during the TBL_BUILD phase. This is just one way to */
/* design it. The alternative is to just do the scanning each time */
/* TBL_BUILD is called from the text data field that is passed in */
/* to the table function at all times. It is a choice that the */
/* designer must make when developing the application. */
/*********************************************************************/
static int Prescan(local_ctx *info, VARCHAR_LATIN *Text, INTEGER *frmstore) {
INTEGER storenum;
int i;
int num_items = 0;
VARCHAR_LATIN *Tscan = Text;
/* find the data for the store we are interested in */
while (*Tscan) {
sscanf((char *) Tscan, "%d", &storenum);
if (*frmstore == storenum) {
/* found the entry of interest - get the information */
/* on how many items there are */
Tscan = next(',', Tscan) + 1;
sscanf((char *) Tscan, "%d", &num_items);
break;
}
/* find next store */
Tscan = next(';', Tscan);
if (*Tscan == '\0')
break;
Tscan++;
}

/* let's malloc some worst case memory to keep track of the items */
/* we collect */
if (num_items) {
info->Item_List = FNC_malloc(sizeof(item_t) * num_items);
if (info->Item_List == NULL)
/* not good - should have been able to get the memory */
return -1;
} else {
info->Num_Items = 0;
return 0;
}
/* now let's find all the entries for the store that we are */
/* interested in */
/* skip to first item */
Tscan = next(':', Tscan) + 1;
for (i = 0; i < num_items; i++) {
sscanf((char *) Tscan, "%d,%d", &info->Item_List[i].custid,
&info->Item_List[i].itemid);
Tscan = next(',', Tscan) + 1;
Tscan = next(',', Tscan) + 1;
}
info->Num_Items = num_items;
info->store_num = *frmstore;
return num_items;
}
/********************************************************************/
/* Extract all of the data now. Actually this routine just takes */
/* the items that Prescan built and transfers the data out one item */
/* at a time. Notice that it does not build the output column if it */
/* is not being asked for. For this simple example it probably */
/* makes no difference, but if there is a lot of complexity in the */
/* application to build some columns then it could when noticing */
/* that a field is null not go through the computation to build it */
/* at all. In fact the Prescan function could have been smarter and */
/* not built the list of fields that are not being asked for */
/********************************************************************/
static int Extract(local_ctx *info, INTEGER *custid, INTEGER *store,
INTEGER *itemid, int custid_i, int store_i, int item_i) {
/* check to see if there is something left to extract */
if (info->Cur_Item == info->Num_Items)
return 0;
/* okay let's set the output data only if they want it */
if (custid_i == 0)
*custid = info->Item_List[info->Cur_Item].custid;
if (store_i == 0)
*store = info->store_num;
if (item_i == 0)
*itemid = info->Item_List[info->Cur_Item].itemid;
/* set up for next item the next time */
info->Cur_Item++;
return 1;
}
/***********************************/
/* Do a reset of the context block */
/***********************************/
static void Reset(local_ctx *info) {
info->Num_Items = 0;
info->Cur_Item = 0;
info->Item_List = NULL;
}
/*********************************************************************/
/* Clean up upon error or when done. Needs to free up any memory */
/* that was allocated or it will return an error message. Note that */
/* the memory was allocated outside of the general scratch pad. But */
/* the address must be retained in the scratch pad or you have no */
/* means of referencing the data or freeing it for subsequent calls. */
/*********************************************************************/
static void Clean_Up(local_ctx *info) {
if (info->Item_List)
FNC_free(info->Item_List);
}
void extract_field(VARCHAR_LATIN *Text, /* field decode */
INTEGER *frmStore, /* data to extract */
INTEGER *custid, /* 1st output column for row */
INTEGER *store, /* 2nd output column */
INTEGER *item, int *Text_i, /* in parameter indicator */
int *frmstore_i, /* if no store, return no row */
int *custid_i, /* 1st output indicator for */
int *store_i, /* row, and so on */
int *item_i, char sqlstate[6], SQL_TEXT fncname[129], SQL_TEXT sfncname[129],
SQL_TEXT error_message[257]) {
local_ctx *state_info;
FNC_Phase Phase;
int status;
/* make sure the function is called in the supported context */
switch (FNC_GetPhase(&Phase)) {
/***********************************************************/
/* Process the constant expression case. Only one AMP will */
/* participate for this example */
/***********************************************************/
case TBL_MODE_CONST:
/* depending on the phase decide what to do */
switch (Phase) {
case TBL_PRE_INIT:
/* let the system know that I want to be the participant */
switch (FNC_TblFirstParticipant()) {
case 1: /* participant */
return;
case 0: /* not participant */
/* don't participate */
if (FNC_TblOptOut()) {
strcpy(sqlstate, "U0006"); /* an error return */
unicpy(error_message, "Opt-out failed.");
return;
}
break;
default: /* -1 or other error */
strcpy(sqlstate, "U0007");
unicpy(error_message, "First Participant logic did not work");
return;
}
case TBL_INIT:
/* get scratch memory to keep track of things */
state_info = FNC_TblAllocCtx(sizeof(local_ctx));
Reset(state_info);
/* Preprocess the Text */
status = Prescan(state_info, Text, frmStore);
if (status == -1) {
Clean_Up(state_info);
strcpy(sqlstate, "U0008");
unicpy(error_message, "Text had pre-scan errors.");
return;
}
break;
case TBL_BUILD:
state_info = FNC_TblGetCtx();
status = Extract(state_info, custid, store, item, *custid_i,
*store_i, *item_i);
if (status == 0)
/* Have no more data, return no data sqlstate. */

strcpy(sqlstate, "02000");
else if (status == -1) {
Clean_Up(state_info);
strcpy(sqlstate, "U0009");
unicpy(error_message, "Text had extract error.");
return;
}
break;
case TBL_END:
/* everyone done */
state_info = FNC_TblGetCtx();
Clean_Up(state_info);
break;
}
break;
/****************************************/
/* Process the varying expression case. */
/****************************************/
case TBL_MODE_VARY:
switch (Phase) {
case TBL_PRE_INIT:
/* get scratch memory to use from now on */
state_info = FNC_TblAllocCtx(sizeof(local_ctx));
Reset(state_info);
break;
case TBL_INIT:
/* Preprocess the Text */
state_info = FNC_TblGetCtx();
status = Prescan(state_info, Text, frmStore);
if (status == -1) {
status = FNC_TblAbort();
if (status == 1) {
Clean_Up(state_info);
strcpy(sqlstate, "U0008");
unicpy(error_message, "Text had pre-scan errors");
return;
}
}
break;
case TBL_BUILD:
state_info = FNC_TblGetCtx();
status = Extract(state_info, custid, store, item, *custid_i,
*store_i, *item_i);
if (status == 0)
/* Have no more data; return no data sqlstate. */

strcpy(sqlstate, "02000");
else if (status == -1) {
status = FNC_TblAbort();
/* If I was the first then report the error */
if (status = 1) {
Clean_Up(state_info);
strcpy(sqlstate, "U0009");
unicpy(error_message, "Text had extract error");
}
return;
}
break;
case TBL_FINI:
/* Initialize for the next set of data. */
state_info = FNC_TblGetCtx();
Clean_Up(state_info);
Reset(state_info);
break;
case TBL_END:
/* Everyone done. */
state_info = FNC_TblGetCtx();
Clean_Up(state_info);
break;
case TBL_ABORT:
state_info = FNC_TblGetCtx();
Clean_Up(state_info);
break;
}
}
}

3. Install DDL in Database

3.1. Logon Bteq

Open a console window, and execute the bteq logon command.

There are 2 cases, depending on whether the DBS is running on the local machine or not.

A. Local DBS

Logon to the DBS using the IP address 127.0.0.1 for the local host.

TDExpress14.0_Sles10:~ # bteq .logon 127.0.0.1/debugger,debugger

B. Remote DBS

Logon to the DBS  running in a remote machine. Its IP address is 192.168.85.128 in this example.

TDExpress14.0_Sles10:~ # bteq .logon 192.168.85.128/debugger,debugger

3.2 Create New CUDF

If this is to create a new CUDF extract_field, please execute the following command at the bteq prompt.

 BTEQ -- Enter your SQL request or BTEQ command:

CREATE FUNCTION extract_field(
Text VARCHAR(32000),
From_Store INTEGER)
RETURNS TABLE (
Customer_ID INTEGER,
Store_ID INTEGER,
Item_ID INTEGER)
LANGUAGE C
NO SQL
PARAMETER STYLE SQL
EXTERNAL NAME 'D!CS!extract_field!/usr/eclipse/workspace/cudf_table/src/extract_field.c';

An output message will be displayed when the execution finishes.

 *** Function has been created.

3.3 Update Existing CUDF

If the extract_field UDF already exists, please execute the following command at the bteq prompt.

 BTEQ -- Enter your SQL request or BTEQ command:

REPLACE FUNCTION extract_field(
Text VARCHAR(32000),
From_Store INTEGER)
RETURNS TABLE (
Customer_ID INTEGER,
Store_ID INTEGER,
Item_ID INTEGER)
LANGUAGE C
NO SQL
PARAMETER STYLE SQL
EXTERNAL NAME 'D!CS!extract_field!/usr/eclipse/workspace/cudf_table/src/extract_field.c';

An output message will be displayed when the execution finishes.

 *** Function has been replaced.

4. Run CUDF in Database

If there is no bteq prompt, please execute [3.1. Logon Bteq] section.

4.1. Data Preparation

The table CUDF can be executed in 2 modes, constant mode and variable mode,  depending on input arguments.

To run your variable mode table CUDF, you will need a database table with data.

Submit the following SQLs at the bteq prompt to create and populate the raw_cust table.

 BTEQ -- Enter your SQL request or BTEQ command:

CREATE SET TABLE raw_cust,
NO FALLBACK,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT
(region INTEGER,
pending_data VARCHAR(32000) CHARACTER SET LATIN NOT CASESPECIFIC)
PRIMARY INDEX ( region );
-- the data in AMP 0
INSERT INTO raw_cust VALUES (1, '25,3:9005,3789,9004,4907,398,9004;36,2:738,9387,738,9550');
-- the data in AMP 0
INSERT INTO raw_cust VALUES (1, '25,2:9005,7896,9004,7839;36,1:737,9387');
-- the data in AMP 1
INSERT INTO raw_cust VALUES (9, '7,2:879,3788,879,4500,390,9004;08,1:500,9056');

Submit the following query to verify the contents of this table.

 BTEQ -- Enter your SQL request or BTEQ command:

SELECT * FROM raw_cust;

The result will be displayed when the query finishes.

4.2. Run CUDF

4.2.1. Constant Mode Table CUDF

The constant mode table CUDF is where data being passed to the table CUDF is a literal string.

Submit the following query to execute the extract_field UDF in constant mode.

 BTEQ -- Enter your SQL request or BTEQ command:

SELECT 
*
FROM
TABLE (extract_field('25,2:9005,7896,9004,7839;36,1:737,9387;',25)) AS T1;

The result will be displayed when the execution finishes.

 *** Query completed. 2 rows found. 3 columns returned.

4.2.2. Variable Mode Table CUDF

The variable mode table CUDF is where the data being passed to the CUDF is from a database table.

Submit the following query to execute extract_field UDF in variable mode.

 BTEQ -- Enter your SQL request or BTEQ command:

SELECT DISTINCT cust.Customer_ID, cust.Item_ID
FROM
raw_cust,
TABLE (extract_field(raw_cust.pending_data, 25)) AS cust
WHERE raw_cust.region = 1;

The result will be displayed when the execution finishes.

 *** Query completed. 5 rows found. 2 columns returned.

5. Debug CUDF in Eclipse

There are 2 debugging types for CUDF, depending on DBS location.

  • Local CUDF debugging - Eclipse and DBS are running on the same machine.
  • Remote CUDF debugging - Eclipse and DBS are running on separate machines.

There is no difference between debugging a local CUDF and debugging a remote CUDF, except for the setup of the debug session.

5.1. Setup Source Lookup Path

Select the pull-down menu Window -> Preferences.

The Preferences window comes up.

Select C/C++ -> Debug -> Source Lookup Path. The Source Lookup Path settings will be shown at the right.

Please select the following values and click the Remove button to remove them one by one if they are in the Default Source Lookup Path list.

  • Absolute File Path
  • Program Relative File Path
  • Project

Click the Add button to add Project-Path Relative to Source Folders into the list.

Click the OK button to save the setting.

5.2. Switch to Debug Perspective

Select the pull-down menu Window -> Open Perspective -> Debug. The Eclipse will switch to Debug perspective.

5.3. Set Breakpoint

For easy reference, we will show the line numbers in the text editor. (To do this, right click the left margin of the text editor, and select Show Line Number.)

Double click the line number where you want to set breakpoint. A blue point shows the breakpoint has been set at the source code. Also a line has been added into the Breakpoints view.

We set 2 breakpoints in the source code for constant mode table udf and variable mode table udf.

  • Line 191 for constant mode table udf
  • Line 249 for variable mode table udf

The first breakpoint for constant mode table udf.

The second breakpoint for variable mode table udf.

Also you can check the 2 breakpoint in Breakpoints view.

To remove the breakpoint, just double click the blue point in the left margin.

5.4. Debug Configuration

Click the triangle button in the right of Debug button, and selec Debug Configurations....

Debug Configurations setting window is displayed. Double click C/C++ Attach to Application in the left list. A debug configuration cudf_table Debug is created automatically.

A. Local DBS

Click the Debug button, the Eclipse debugger will try to show all the processes in the VM. A Select Process window comes up. 

Input our process name udfsectsk. The list will filter the items with input characters. In our case, there are 2 udfsectsk processes.

Note: If we have not executed any CUDF since database startup, the udfsectsk process is not started. So please make sure you have executed a CUDF once before debugging.

Select the first udfsectsk - 15243 process, and click the OK button to attach to the first udfsectsk process. The first udfsectsk process will be paused in debug mode. Click the Resume button or press F8 to continue.

Click the triangle button in the right of Debug button, and select cudf_table Debug which we created above. 

The Select Process window comes up. Enter our process name like above. Select another udfsectsk process and click the OK button to connect the udfsectsk - 15245 process.

The second udfsectsk process will be paused in debug mode. Click the Resume button or press F8 to continue.

We have connected 2 udfsectsk processes.

Note: If there are more than 2 udfsectsk processes in the process list, please make sure you have connected all of them because we ca not easily know which process will execute our CUDF.

B. Remote DBS

Click the Debugger tab, and change the Debugger from gdb to gdbserver.

Click the Connection tab in Debugger Options and entert Host name or IP address.

To open a remote debug connection port, please submit the following command in a console window in client for one connection.

TDExpress14.0_Sles10:~ # ssh root@192.168.85.128 gdbserver --multi :10000

You will be prompted for the password of the root user on the remote DBS in order to connect.

In our case, ienter DBS IP address 192.168.85.128 and the port 10000 which we just created in the remote DBS.

Click the Debug button to connect to remote DBS debug port. 

Click the Connect to process button, a Select Process window comes up. Enter process name udfsectsk and click the OK button.

A window comes up. Please find your compiled binary file and click the OK button. In our case, the Location should be /usr/eclipse/workspace/cudf_table/Debug/libcudf_table.so.

The remote connection will be established, and the remote udfsectsk process attached. That process will be paused in debug mode.

Click the Resume button or press F8 to continue.

When the first connection is established, please open another console window and submit the following command to open another remote debug port.

TDExpress14.0_Sles10:~ # ssh root@192.168.85.128 gdbserver --multi :10001

Change connection Port number to 10001 and click the Debug button.

Another debug session is created. Repeat the above procedure to connect to another remote udfsectsk process.

Note: If there are more than 2 udfsectsk processes in the process list, please make sure you have connected all of them because we cannot easily know which processes will execute our CUDF.

5.5. Breakpoint Hit

If there is no bteq prompt, please execute [3.1. Logon Bteq] section.

5.5.1. Constant Mode Table UDF

Enter the following query at bteq prompt:

 BTEQ -- Enter your SQL request or BTEQ command:

SELECT
*
FROM
TABLE (extract_field('25,2:9005,7896,9004,7839;36,1:737,9387;',25)) AS T1;

The first breakpoint for constant mode table udf will be hit in both processes.

We hit the breakpoint at line 191. The backtrace shows the thread was paused at extract_field() at extract_field.c: 191 0x2aaaac458fbd.

5.5.2. Variable Mode Table UDF

Enter the following query at bteq prompt:

 BTEQ -- Enter your SQL request or BTEQ command:

SELECT DISTINCT cust.Customer_ID, cust.Item_ID
FROM
raw_cust,
TABLE (extract_field(raw_cust.pending_data, 25)) AS cust
WHERE raw_cust.region = 1;

The second breakpoint for variable mode table udf will be hit in both processes.

Now we can see the variables in Variables view for both processes and check other information in corresponding views.

5.6. Debugging

There are several debug buttons used for debugging.

  1. Resume (short-cut key: F8)
  2. Suspend
  3. Teminate
  4. Disconnect
  5. Step Into (short-cut key: F5)
  6. Step Over (short-cut key: F6)
  7. Step Return (short-cut key: F7)

Select the first process udfsectsk[15243], and press Resume button or press F8 to get the next data. Select the second process udfsectsk[15245], and press Resume button or press F8 to get the next data.

Select the second process, thread execution now pauses at same line 249. The value of Phase is changed to 21 (TBL_INIT).

Click the Step Over button or press F6, the UDF breakpoint will be moved in line 257. 

Click the Step Over button or press F6 again, the cursor will be moved in line 258.

Then click the Step Into button or press F5, the cursor will be moved into Prescan function in line 80.

We can check the variables in variables view and debug the code in Prescan function.

Click the Step Return button or F7 to get back. The running cursor will be back at line 258.

5.7. Stop Debugging

To stop the debugging, click the Disconnect button.

Note: If there are several debug sessions in the Debug view, please select one of them and click the Disconnect button for each time.

The debugging session will be teminated, and the udf execution will be resumed.

Tips: Click the XX button in the Debug view, all the terminated processes will be removed.

For more tips and tricks, please refer to the following links.

Eclipse (Indigo) C/C++ Development User Guide

Tags (3)