How to Create and Debug a Window Aggregate CUDF using Eclipse

Extensibility
Extensibility covers the mechanisms by which you, as the user or developer, can extend the functionality of the Teradata Database, for example with the use of User Defined Functions, or UDFs.
Teradata Employee

How to Create and Debug a Window Aggregate CUDF using Eclipse

SQL provides a set of useful functions, but they might not satisfy all of the particular requirements you have to process your data.

User-defined functions (UDFs) allow you to extend SQL by writing your own functions in the C/C++ programming languages, installing them on the database, and then using them like standard SQL functions.

Teradata Database supports three types of CUDFs (C/C++ User-Defined Functions). They are scalar UDF, aggregate UDF and window aggregate UDF, and table UDF. 

This article will show how to create a simple window aggregate CUDF, execute it in protected mode, and debug it locally or remotely using Eclipse CDT. The UDF runs under the default operating system user "tdatuser", and can access the system resources for which tdatuser has privileges.. Local Debugging means Eclipse and DBS are running on the same machine. Remote Debugging means Eclipse and DBS are running on separate machines.

In this example, the window aggregate CUDF evaluates dense rank over the set of values passed as arguments to the UDF. With dense ranking, items that compare equal receive the same ranking number, and the next item(s) receive the immediately following ranking number.

Prerequisite for this Article

If you have not worked through the guide How to Set up a UDF / XSP Debugging Environment, please do it now before you continue.

Article Outline

In this article, we will do the following things to show how to create and debug a window aggregate CUDF using Eclipse CDT.

  1. Create C Project in Eclipse
  2. Create CUDF Source File in C Project
  3. Install DDL in Database
  4. Run CUDF in Database
  5. Debug CUDF in Eclipse

1. Create C Project in Eclipse

1.1. C/C++ Perspective

Select the pull-down menu Window -> Open Perspective -> Other ...

An Open Perspective window is displayed. Select C/C++, and click the OK button.

1.2. Create C Project

Go to the Project Explorer in Eclipse and right click New -> C Project.

C Project window comes up.

Enter Project name cudf_window_aggregate, select Project type Shared Library, and select Toolchains Linux GCC. Then click the Finish button.

A new C project cudf_window_aggregate will be created.

1.3. Change Build Settings

1.3.1. Show Build Settings

Right click the project, and select Properties.

The next window shows the default Properties for cudf_window_aggregate.

Click C/C++ Build -> Settings. The right of window shows default C/C++ build settings.

You can move the scrollbars at the right or bottom to see the part that is out of view.

1.3.2. Change Compiler Settings

Select GCC C Compiler -> Includes. Click the + button. An Add directory path window shows up. Enter Directory /usr/tdbms/etc.

The /usr/tdbms/etc path will be added into the right list.

Select Miscellaneous in GCC C Compiler. Check the flag Position Independent Code (-fPIC).

1.3.3. Change Linker Settings

Select GCC C Linker -> Libraries. Click the + button as below to add Library search path.

An Add directory path window shows up. Enter /usr/tdbms/lib in the text box and click OK.

The /usr/tdbms/lib path will be shown in the lower list.

Click the OK button to save the settings.

2. Create CUDF Source File in C Project

2.1. Create Source Folder

Right click the project, and select New -> Source Folder.

New Source Folder window comes up. Please enter the source folder name src and click the Finish button.

The src source folder will be added to the project.

2.2. Create New Source File

Right click the src folder, select New -> Source File.

New Source File window is displayed. Enter Source file dense_rank.c. Click the Finish button.

A new source file dense_rank.c will be created. 

2.3. Edit The Source File

Edit the source file as shown below.

/*
* dense_rank.c
*
* Created on: Oct 10, 2012
* Author: root
*/
#define SQL_TEXT Latin_Text
#include <sqltypes_td.h>
#include <string.h>
typedef struct agr_storage {
int cr; // current rank
int pv; // previous value
} AGR_Storage;

void dense_rank(FNC_Phase phase, FNC_Context_t *fctx, INTEGER *x,
INTEGER *result, int *x_i, int *result_i, char sqlstate[6],
SQL_TEXT fncname[129], SQL_TEXT sfncname[129],
SQL_TEXT error_message[257]) {
/* pointer to intermediate storage area */
AGR_Storage *s1 = fctx->interim1;
/* switch to determine the aggregation phase */
switch (phase) {
case AGR_INIT:
/* This UDF currently handles only the cumulative window */
/* type for illustrative purposes. */
if (fctx->window_size != -1) {
strcpy(error_message, "Only cumulative window type supported");
strcpy(sqlstate, "U0001"); /* see SQLSTATE table */
return;
}
if ((s1 = FNC_DefMem(sizeof(AGR_Storage))) == NULL) {
strcpy(sqlstate, "U0002");
return;
}
s1->cr = 1;
s1->pv = *x;
/***************************************************/
/* Fall through to the detail phase */
/***************************************************/
case AGR_DETAIL:
if (*x != s1->pv) {
s1->cr++;
s1->pv = *x;
}
break;
case AGR_FINAL:
*result = s1->cr;
break;
/* Add this to generate an error for any undefined phases */
case AGR_COMBINE:
case AGR_MOVINGTRAIL:
default:
sprintf(error_message, "phase is %d", phase);
strcpy(sqlstate, "U0005");
return;
}
}

3. Install DDL in Database

3.1. Logon Bteq

Open a console window, and execute the bteq logon command.

There are 2 cases, depending on whether the DBS is running on the local machine or not.

A. Local DBS

Logon to the local DBS using the IP address 127.0.0.1 for the local host.

TDExpress14.0_Sles10:~ # bteq .logon 127.0.0.1/debugger,debugger

B. Remote DBS

Logon to the DBS running in a remote machime. Its IP address is 192.168.85.128 in this example.

TDExpress14.0_Sles10:~ # bteq .logon 192.168.85.128/debugger,debugger

3.2 Create New CUDF

If this is to create a new CUDF dense_rank, please execute the following command at the bteq prompt.

 BTEQ -- Enter your SQL request or BTEQ command:

CREATE FUNCTION dense_rank (x INTEGER)
RETURNS INTEGER
CLASS AGGREGATE (1000)
LANGUAGE C
NO SQL
PARAMETER STYLE SQL
DETERMINISTIC
CALLED ON NULL INPUT
EXTERNAL NAME 'D!CS!dense_rank!/usr/eclipse/workspace/cudf_window_aggregate/src/dense_rank.c';

An output message will be displayed when the execution finishes.

 *** Function has been created.

3.3 Update Existing CUDF

If the dense_rank UDF already exists, please execute the following command at the bteq prompt.

 BTEQ -- Enter your SQL request or BTEQ command:

REPLACE FUNCTION dense_rank (x INTEGER)
RETURNS INTEGER
CLASS AGGREGATE (1000)
LANGUAGE C
NO SQL
PARAMETER STYLE SQL
DETERMINISTIC
CALLED ON NULL INPUT
EXTERNAL NAME 'D!CS!dense_rank!/usr/eclipse/workspace/cudf_window_aggregate/src/dense_rank.c';

An output message will be displayed when the execution finishes.

 *** Function has been replaced.

4. Run CUDF in Database

If there is no bteq prompt, please execute [3.1. Logon Bteq] section.

4.1. Data Preparation

To run your window aggregate CUDF, you will need a database table with data.

Submit the following SQLs at the bteq prompt to create and populate the rankdata table.

 BTEQ -- Enter your SQL request or BTEQ command:

CREATE MULTISET TABLE rankdata (
id INTEGER,
v INTEGER);
-- insert the data in AMP 0
INSERT INTO rankdata VALUES (1,1);
INSERT INTO rankdata VALUES (1,2);
INSERT INTO rankdata VALUES (1,2);
INSERT INTO rankdata VALUES (1,4);
INSERT INTO rankdata VALUES (1,5);
INSERT INTO rankdata VALUES (1,5);
INSERT INTO rankdata VALUES (1,5);
INSERT INTO rankdata VALUES (1,8);
INSERT INTO rankdata VALUES (1,);
-- insert the data in AMP 1
INSERT INTO rankdata VALUES (9,1);
INSERT INTO rankdata VALUES (9,1);
INSERT INTO rankdata VALUES (9,3);
INSERT INTO rankdata VALUES (9,4);

Submit the following query to verify the contents of this table.

 BTEQ -- Enter your SQL request or BTEQ command:

SELECT * FROM rankdata ORDER BY id, v;

The result will be displayed when the query finishes.

4.2. Run CUDF

Execute the following query to run the dense_rank UDF.

 BTEQ -- Enter your SQL request or BTEQ command:

SELECT
id,
v,
dense_rank(v) OVER (PARTITION BY id ORDER BY v ROWS UNBOUNDED PRECEDING)
AS dense_rank,
rank() OVER (PARTITION BY id ORDER BY v) AS common_rank
FROM rankdata
ORDER BY id, dense_rank;

The result will be displayed when the execution finishes.

Note the difference in the dense rank value and rank value for id=1 and v=4 (the data in AMP 0), and for  id=9 and v=3 (the data in AMP 1).

 *** Query completed. 13 rows found. 4 columns returned.

5. Debug CUDF in Eclipse

There are 2 debugging types for CUDF, depending on DBS location.

  • Local CUDF debugging - Eclipse and DBS are running on the same machine.
  • Remote CUDF debugging - Eclipse and DBS are running on separate machines.

There is no difference between debugging a local CUDF and debugging a remote CUDF, except for the setup of the debug session.

5.1. Setup Source Lookup Path

Select the pull-down menu Window -> Preferences.

The Preferences window comes up.

Select C/C++ -> Debug -> Source Lookup Path. The Source Lookup Path settings will be shown at the right.

Please select the following values and click the Remove button to remove them one by one if they are in the Default Source Lookup Path list.

  • Absolute File Path
  • Program Relative File Path
  • Project

Click the Add button to add Project-Path Relative to Source Folders into the list.

Click the OK button to save the setting.

5.2. Switch to Debug Perspective

Select the pull-down menu Window -> Open Perspective -> Debug.

The Eclipse will switch to Debug perspective.

5.3. Set Breakpoint

For easy reference, we will show the line numbers in the text editor. (To do this, right click the left margin of the text editor, and select Show Line Number.)

Double click the line number where you want to set breakpoint. A blue point shows the breakpoint has been set at line 22 of the source code. Also a line has been added into the Breakpoints view.

To remove the breakpoint, just double click the blue point in the left margin.

5.4. Debug Configuration

Select the pull-down menu Run -> Debug Configurations....

Debug Configurations setting window is displayed. Double click C/C++ Attach to Application in the left list. A debug configuration cudf_window_aggregate Debug is created automatically.

A. Local DBS

Click the Debug button, the Eclipse debugger will show all the processes in the VM. A Select Process window comes up.

Enter process name udfsectsk. The list will filter the items with input characters. In our case, there are 2 udfsectsk processes.

Note: If we have not executed any CUDF since database startup, the udfsectsk process is not started. So please make sure you have executed a CUDF once before debugging.

Select the first udfsectsk - 15243 process, and click the OK button to attach to this process. The process will be paused in debug mode. Click the Resume button or press F8 to continue.

Click the triangle button in the right of Debug button, and select cudf_window_aggregate Debug which we just created above. 

The Select Process window comes up. Input our process name like above. Select the second udfsectsk process and click the OK button to connect the udfsectsk - 15245 process.

The second udfsectsk process will be paused in debug mode. Click the Resume button or press F8 to continue.

We have connected 2 udfsectsk processes.

Note: If there are more than 2 udfsectsk processes in the process list, please make sure you have connected all of them because we cannot easily know which process will execute our CUDF.

B. Remote DBS

Click the Debugger tab, and change the Debugger from gdb to gdbserver.

Click the Connection tab in Debugger Options and enter Host name or IP address.

To open a remote debug connection port, please submit the following command in a console window in client.

TDExpress14.0_Sles10:~ # ssh root@192.168.85.128 gdbserver --multi :10000

You will be prompted for the password of the root user on the remote DBS in order to connect.

In our case, enter DBS IP address 192.168.85.128 and the port 10000 which we just created in the remote DBS.

Click the Debug button to connect to remote DBS debug port. 

Click the Connect to process button, a Select Process window comes up. Enter process name udfsectsk and click the OK button.

A window comes up. Please find your compiled binary file and click the OK button. In our case, the Location should be /usr/eclipse/workspace/cudf_window_aggregate/Debug/libcudf_window_aggregate.so.

The remote connection will be established, and the remote udfsectsk process attached. That process will be paused in debug mode.

Click the Resume button or press F8 to continue.

When the first connection is established, please open another console window and submit the following command to open another remote debug port.

TDExpress14.0_Sles10:~ # ssh root@192.168.85.128 gdbserver --multi :10001

Change connection Port number to 10001 and click the Debug button.

Another debug session is created. Repeat the above procedure to connect to another remote udfsectsk process.

Note: If there are more than 2 udfsectsk processes in the process list, please make sure you have connected all of them because we can not easily know which processes will execute our CUDF.

5.5. Breakpoint Hit

If there is no bteq prompt, please execute [3.1. Logon Bteq] section.

Enter the following query at bteq prompt:

 BTEQ -- Enter your SQL request or BTEQ command:

SELECT
id,
v,
dense_rank(v) OVER (PARTITION BY id ORDER BY v ROWS UNBOUNDED PRECEDING)
AS dense_rank,
rank() OVER (PARTITION BY id ORDER BY v) AS common_rank
FROM rankdata
ORDER BY id, dense_rank;

The breakpoint will be hit in both processes.

We hit the breakpoint at line 22. The backtrace shows the thread was paused at dense_rank() at dense_rank.c: 22 0x2aaaac356968.

Now we can see the variables in Variables view for both processes, and check other information in corresponding views.

Select the first process in Debug view, the Variables view will be refreshed with its context. Here we can see value of  phase is 1 (AGR_INIT), and the value of x is 0.

5.6. Debugging

There are several debug buttons used for debugging.

  1. Resume (short-cut key: F8)
  2. Suspend
  3. Teminate
  4. Disconnect
  5. Step Into (short-cut key: F5)
  6. Step Over (short-cut key: F6)
  7. Step Return (short-cut key: F7)

Select the first process udfsectsk[15243], and press Resume button or press F8 to continue. Thread execution now pauses at same line 22 to process the next data row. The value of phase is changed to 4 (AGR_FINAL).

Click the Step Over button or press F6 two times. The UDF breakpoint will be moved to line 48. 

Find variable result in Variables view. The second row data rank is 1.

Press Resume button or press F8 to get the next data.

5.7. Stop Debugging

To stop the debugging, click the Disconnect button.

Note: If there are several debug sessions in the Debug view, please select and click the Disconnect button for each session.

Disconnect all debugging sessions, and the udf execution will be resumed.

Tips: Click the XX button in the Debug view, and all the terminated processes will be removed.

For more tips and tricks, please refer to the following link.

Eclipse (Indigo) C/C++ Development User Guide