Java UDF on Teradata 13 demo is very slow. (Vista OS)

Connectivity
Enthusiast

Java UDF on Teradata 13 demo is very slow. (Vista OS)

Test Case:
Table INDATA with two column X,Y. Create table OUTDATA with two columns S and D which are sum and difference of X and Y respectively.
Using SLQ the problem can be solved easily as

CREATE TABLE OUTDATA AS
( SELECT X+ Y AS S, X - Y AS D FROM INDATA ) WITH DATA

Table INDATA is populated with 100K rows for this experiment.

On dual-core Vista laptop (Dell XPS) the SQL statement above takes 0.285 seconds to execute.

Java Variable TABLE UDF can be used to do the same calculations. Java class has a function "evaluate" with two inputs and outputs (the complete code is attached);

class Example {
public static void calculate(
double x,
double y,
double[] sum,
double[] diff) {
….
}
}

The SQL query with Java UDF is

CREATE TABLE OUTDATA AS (
SELECT * FROM
TABLE (evaluate(INDATA.X, INDATA.Y)) AS t ) WITH DATA

On the same hardware the query above takes 42 seconds, i.e. 150 times slower than SQL query without UDF.

I tried to investigate further the performance bottleneck. It is not possible to connect java profiler tool to Teradata database so I had to resolve to crude System.nanoTimer() for performance measurements.

Here is the summary of the results:

Evaluate is called 400K +4 times. (two AMPs)
100K times is called with phase=TBL_INIT
100K times is called with phase=TBL_BUILD and actual row is calculated
100K times is called with phase=TBL_BUILD and SQL exception "02000" is thrown
100K times is called with phase=TBL_FINI

Time spent within each function
400K constructor new Tbl() 4 sec
400K getPhase() 6 sec
200K tbl.getCtxObject/ tbl.setCtxObject 10 sec
100K actual calculations <0.1 sec
400K total time within evaluate 22 sec
Time spend outside of Java within Teradata itself 22 sec

As you could see there is a huge overhead for EACH call. For the actual application I would have more input/outputs and 10 mln rows. Even if I assume that overhead does not increase with the number of parameters I would estimate that overhead would take more than one 1 hour on Vista.

I have not run the experiments on the actual production Teradata yet.

Here are my questions:
1. Should I expect the same overhead on production Teradata (i.e. Teradata hardware), may be divided over the number of AMPs?
2. Am I doing something wrong and is there a better approach to accomplish the same?
3. Does Teradata have any plans to address the performance bottleneck? Any recommendation how I can minimize the overhead?
4. I noticed in the documentation that for C/C++ UDF you can run the functions INPROCESS. I did not find any such option for Java. Does it exist?
2 REPLIES
Enthusiast

Re: Java UDF on Teradata 13 demo is very slow. (Vista OS)

Here is the full source code for calculate method:

public static void calculate(double x,
double y,
double[] sum,
double[] diff)
throws SQLException, IOException, ClassNotFoundException {

int[] phases = new int[1];
Tbl tbl = new Tbl();

int rc = tbl.getPhase(phases);

switch (phases[0]) {
case Tbl.TBL_PRE_INIT:

base.preinit(tbl);
break;

case Tbl.TBL_FINI:
break;

case Tbl.TBL_BUILD:
// return the single row and throw exception on the next call with the same arguments
if (toggle()) {
throw new SQLException("no more data", "02000");
}
sum[0] = x + y;
diff[0] = x - y;
break;

case Tbl.TBL_END:
break;
case Tbl.TBL_INIT:
break;

default:
throw new SQLException("unknown phase supported." + phases[0]);
}

}

Re: Java UDF on Teradata 13 demo is very slow. (Vista OS)

I'm running into a similar issue - have you come up with a solution?