How to Create and Debug an Aggregate CUDF using Eclipse

Extensibility
Extensibility covers the mechanisms by which you, as the user or developer, can extend the functionality of the Teradata Database, for example with the use of User Defined Functions, or UDFs.
Teradata Employee

How to Create and Debug an Aggregate CUDF using Eclipse

SQL provides a set of useful functions, but they might not satisfy all of the particular requirements you have to process your data.

User-defined functions (UDFs) allow you to extend SQL by writing your own functions in the C/C++ programming languages, installing them on the database, and then using them like standard SQL functions.

Teradata Database supports three types of CUDFs (C/C++ User-Defined Functions). They are scalar UDF, aggregate UDF and window aggregate UDF, and table UDF. 

This article will show how to create a simple aggregate CUDF, execute it in protected mode, and debug it locally or remotely using Eclipse CDT. The UDF runs under the default operating system user "tdatuser", and can access the system resources for which tdatuser has privileges.. Local Debugging means Eclipse and DBS are running on the same machine. Remote Debugging means Eclipse and DBS are running on separate machines.

In this example, the aggregate CUDF takes a list of prices from a Products database table and returns the sum of all prices from that list.

Prerequisite for this Article

If you have not worked through the guide How to Set up a UDF / XSP Debugging Environment, please do it now before you continue.

Article Outline

In this article, we will do the following things to show how to create and debug an aggregate CUDF using Eclipse CDT.

  1. Create C Project in Eclipse
  2. Create CUDF Source File in C Project
  3. Install DDL in Database
  4. Run CUDF in Database
  5. Debug CUDF in Eclipse

1. Create C Project in Eclipse

1.1. C/C++ Perspective

Select the pull-down menu Window -> Open Perspective -> Other ...

An Open Perspective window is displayed. Select C/C++, and click the OK button.

1.2. Create C Project

Go to the Project Explorer in Eclipse and right click New -> C Project.

C Project window comes up.

Enter Project name cudf_aggregate, select the Project type Shared Library, and select the Toolchains Linux GCC. Then click the Finish button.

A new C project cudf_aggregate will be created.

1.3. Change Build Settings

1.3.1. Show Build Settings

Right click the project, and select Properties.

The next window shows the default Properties for cudf_aggregate.

Click C/C++ Build -> Settings. The right of window shows default C/C++ build settings.



You can move the scrollbars at the right or bottom to see  the part that is out of view.

1.3.2. Change Compiler Settings

Select GCC C Compiler -> Includes. Click the + button. An Add directory path window shows up. Enter Directory /usr/tdbms/etc.

The /usr/tdbms/etc path will be added into the right list.

Select Miscellaneous in GCC C Compiler. Check the flag Position Independent Code (-fPIC).

1.3.3. Change Linker Settings

Select GCC C Linker -> Libraries. Click the + button as below to add Library search path.

An Add directory path window shows up. Enter /usr/tdbms/lib in the text box and click OK.

The /usr/tdbms/lib path will be shown in the lower list.

Click the OK button to save the settings.

2. Create CUDF Source File in C Project

2.1. Create Source Folder

Right click the project, and select New -> Source Folder.

A New Source Folder window comes up. Please enter the source folder name src and click the Finish button.

The src source folder will be added to the project.

2.2. Create New Source File

Right click the src folder, select New -> Source File.

New Source File window is displayed. Enter Source file udf_sumfloat.c. Click the Finish button.

A new source file udf_sumfloat.c will be created. 

2.3. Edit The Source File

Edit the source file as shown below.

/*
* udf_sumfloat.c
*
* Created on: Oct 10, 2012
* Author: root
*/
#define SQL_TEXT Latin_Text
#include "sqltypes_td.h"
typedef struct arg_storage {
FLOAT xSum;
} AgrStorage;

/* Computes SUM. */
void udf_sumfloat(FNC_Phase phase, FNC_Context_t *fctx, FLOAT *x, FLOAT *result,
int *x_i, int *result_i, char sqlstate[6], SQL_TEXT fncname[129],
SQL_TEXT sfncname[129], SQL_TEXT errorMsg[257]) {
AgrStorage *s1 = fctx->interim1;
AgrStorage *s2 = fctx->interim2;

switch (phase) {
case AGR_INIT:
if ((s1 = FNC_DefMem(sizeof(AgrStorage))) == NULL) {
strcpy(sqlstate, "U0001");
return;
}

s1->xSum = 0.0;

// No break, so execution falls through to AGR_DETAIL case
case AGR_DETAIL:
if (*x_i != -1) { // if not null accumulate summation
s1->xSum += *x; // s1 points to intermediate storage
}

break;

// AGR_COMBINE is encountered multiple times depending on the
// number of AMPs and caching effects.
case AGR_COMBINE:
s1->xSum += s2->xSum;
break;

case AGR_FINAL:
*result = s1->xSum;
break;
case AGR_NODATA:
*result_i = -1;
break;
default:
strcpy(sqlstate, "U0005");
break;
}
return;
}

3. Install DDL in Database

3.1. Logon Bteq

Open a console window, and execute the bteq logon command.

There are 2 cases, depending on whether the DBS is running on the local machine or not.

A. Local DBS

Logon to the local DBS using the IP address 127.0.0.1 for the local host.

TDExpress14.0_Sles10:~ #bteq .logon 127.0.0.1/debugger,debugger

B. Remote DBS

Logon to the DBS running in a remote machime. Its IP address is 192.168.85.128 in this example.

TDExpress14.0_Sles10:~ # bteq .logon 192.168.85.128/debugger,debugger

3.2 Create New CUDF

If this is to create a new CUDF udf, please execute the following command at the bteq prompt.

 BTEQ -- Enter your SQL request or BTEQ command:

CREATE FUNCTION udf_sumfloat
(aFloat FLOAT)
RETURNS FLOAT
CLASS AGGREGATE (50)
LANGUAGE C
NO SQL
SPECIFIC udf_sumfloat
PARAMETER STYLE SQL
EXTERNAL NAME 'D!CS!udf_sumfloat!/usr/eclipse/workspace/cudf_aggregate/src/udf_sumfloat.c';

An output message will be displayed when the execution finishes.

 *** Function has been created.

3.3 Update Existing CUDF

If the udf_sumfloat UDF already exists, please execute the following command at the bteq prompt.

 BTEQ -- Enter your SQL request or BTEQ command:

REPLACE FUNCTION udf_sumfloat
(aFloat FLOAT)
RETURNS FLOAT
CLASS AGGREGATE (50)
LANGUAGE C
NO SQL
SPECIFIC udf_sumfloat
PARAMETER STYLE SQL
EXTERNAL NAME 'D!CS!udf_sumfloat!/usr/eclipse/workspace/cudf_aggregate/src/udf_sumfloat.c';

An output message will be displayed when the execution finishes.

 *** Function has been replaced.

4. Run CUDF in Database

If there is no bteq prompt, please execute [3.1. Logon Bteq] section.

4.1. Data Preparation

To run your aggregate CUDF, you will need a database table with data.

Submit the following SQLs at the bteq prompt to create and populate the products table.

 BTEQ -- Enter your SQL request or BTEQ command:

CREATE TABLE products (
id INTEGER NOT NULL PRIMARY KEY,
description varchar(255),
price decimal(15,2)
);

-- the data in AMP 0
INSERT INTO products (id, description, price) values(1, 'Lamp', 25.0);
-- the data in AMP 0
INSERT INTO products (id, description, price) values(2, 'Table', 50.0);
-- the data in AMP 0
INSERT INTO products (id, description, price) values(3, 'Chair', 15.0);
-- the data in AMP 1
INSERT INTO products (id, description, price) values(9, 'Zhao', 33.4);

Note: If the table or data already exists, ignore the errors of above SQLs.

Submit the following query to verify the contents of this table.

 BTEQ -- Enter your SQL request or BTEQ command:

SELECT id, price, description FROM products;

The result will be displayed when the query finishes.

4.2. Run CUDF

Execute the following query to run the udf_sumfloat UDF.

 BTEQ -- Enter your SQL request or BTEQ command:

select udf_sumfloat(price) from products;

The result will be displayed when the execution finishes. In this case, udf_sumfloat correctly returns the sum of 123.4 for all the prices in products table.

 *** Query completed. One row found. One column returned.

5. Debug CUDF in Eclipse

There are 2 debugging types for CUDF, depending on DBS location.

  • Local CUDF debugging - Eclipse and DBS are running on the same machine.
  • Remote CUDF debugging - Eclipse and DBS are running on separate machines.

There is no difference between debugging a local CUDF and debugging a remote CUDF, except for the setup of the debug session.

5.1. Setup Source Lookup Path

Select the pull-down menu Window -> Preferences.

The Preferences window comes up.

Select C/C++ -> Debug -> Source Lookup Path. The Source Lookup Path settings will be shown at the right.

Please select the following values and click the Remove button to remove them one by one if they are in the Default Source Lookup Path list.

  • Absolute File Path
  • Program Relative File Path
  • Project

Click the Add button to add Project-Path Relative to Source Folders into the list.

Click the OK button to save the setting.

5.2. Switch to Debug Perspective

Select the pull-down menu Window -> Open Perspective -> Debug.

The Eclipse will switch to Debug perspective.

5.3. Set Breakpoint

For easy reference, we will show the line numbers in the text editor. (To do this, right click the left margin of the text editor, and select Show Line Number.)

Double click the line number where you want to set breakpoint. A blue point shows the breakpoint has been set at line 20 of the source code. Also a line has been added into the Breakpoints view.

To remove the breakpoint, just double click the blue point in the left margin.

5.4. Debug Configuration

Select the pull-down menu Run -> Debug Configurations....

Debug Configurations setting window is displayed. Double click C/C++ Attach to Application in the left list. A debug configuration cudf_aggregate Debug is created automatically.

A. Local DBS

Click the Debug button, the Eclipse debugger will try to show all the processes in the VM. A Select Process window comes up.

Enter process name udfsectsk. The list will filter the items with input characters. In our case, there are 2 udfsectsk processes.

Note: If we have not executed any CUDF since database startup, the udfsectsk process is not started. So please make sure you have executed a CUDF once before debugging.

Select the first udfsectsk - 15243 process, and click the OK button to attach to this process. The process will be paused in debug mode. Click the Resume button or press F8 to continue.

Click the triangle button in the right of Debug button, and select cudf_aggregate Debug which we just created above. 

The Select Process window comes up. Input our process name like above. Select the second udfsectsk process and click the OK button to connect the udfsectsk - 15245 process.

The second udfsectsk process will be paused in debug mode. Click the Resume button or press F8 to continue.

We have connected 2 udfsectsk processes.

Note: If there are more than 2 udfsectsk processes in the process list, please make sure you have connected all of them because we cannot easily know which processes will execute our CUDF.

B. Remote DBS

Click the Debugger tab, and change the Debugger from gdb to gdbserver.

Click the Connection tab in Debugger Options and enter Host name or IP address.

To open a remote debug connection port, please submit the following command in a console window in client for one connection.

TDExpress14.0_Sles10:~ # ssh root@192.168.85.128 gdbserver --multi :10000

You will be prompted for the password of the root user on the remote DBS in order to connect.

In our case, enter DBS IP address 192.168.85.128 and port 10000 which we just created in the remote DBS.

Click the Debug button to connect to remote DBS debug port. 

Click the Connect to process button, a Select Process window comes up. Enter process name udfsectsk and click the OK button.

A window comes up. Please find your compiled binary file and click the OK button. In our case, the Location should be /usr/eclipse/workspace/cudf_aggregate/Debug/libcudf_aggregate.so.

The remote connection will be established, and remote udfsectsk process attached. That process will be paused in debug mode.

Click the Resume button or press F8 to continue.

When the first connection is established, please open another console window and submit the following command to open another remote debug port.

TDExpress14.0_Sles10:~ # ssh root@192.168.85.128 gdbserver --multi :10001

Change connection Port number to 10001 and click the Debug button.

Another debug session is created.

 Repeat the above procedure to connect to another remote udfsectsk process.

In our case, there are 3 remote udfsectsk processes. So we connected all of them.

Note: If there are more than 2 udfsectsk processes in the process list, please make sure you have connected all of them because we can not easily know which processes will execute our CUDF.

5.5. Breakpoint Hit

If there is no bteq prompt, please execute [3.1. Logon Bteq] section.

Enter the following query at bteq prompt:

 BTEQ -- Enter your SQL request or BTEQ command:

select udf_sumfloat(price) from products;

The breakpoint will be hit in both processes.

We hit the breakpoint at line 20. The backtrace shows the thread was paused at udf_sumfloat() at udf_sumfloat.c: 20 0x2aaac2558b4.

Now we can see the variables in Variables view for both processes and check other information in corresponding view.

For comparison, we can check the data content as shown in section [4.1. Data Preparation]

Select the first process in Debug view, the Variables view will be refreshed with its context. Here we can see the value of x is 15.

Select the second process in Debug view, the Variables view will be refreshed with its context. Here we can see the value of x is 33.3999999999999 (33.34).

5.6. Debugging

There are several debug buttons used for debugging.

  1. Resume (short-cut key: F8)
  2. Suspend
  3. Teminate
  4. Disconnect
  5. Step Into (short-cut key: F5)
  6. Step Over (short-cut key: F6)
  7. Step Return (short-cut key: F7)

Select the first process udfsectsk[15243], and press Resume button or press F8 to continue. Thread execution pauses at same line 20 to process the next data row. The value of x is changed to 25.

Click the Step Over button or press F6 two times. The UDF breakpoint will be moved to line 32. 

Find variable s1 in Variable view.

Select the first process in Debug view, and click  the Step Over button or press  F6.

The value of s1 will be updated from 15 to 40, resulted from adding the current x value (25).

5.7. Stop Debugging

To stop the debugging, click the Disconnect button.

Note: If there are several debug sessions in the Debug view, please select and click the Disconnect button for each session.

The debugging sessions will be teminated, and the udf execution will be resumed.

Tips: Click the XX button in the Debug view, and all the terminated processes will be removed.

For more tips and tricks, please refer to the following link.

Eclipse (Indigo) C/C++ Development User Guide