JDBC scalability

Connectivity
Senior Supporter

JDBC scalability

I have script which should load data files into x tables in a near realtime fashion via JDBC.

One row can end up in multiple tables and the application is taking care of RI constrains - which is why JDBC is in favour. But the RI constrains also prevent the usage of the bach feature of JDBC.

 

Let's assume a single process can load Y rows per second. This is not enough for the SLA. Therefore it is considered to run multiple instances of this process each loading a different set of files but all targeting the same x tables. The PI of all tables are Unique.

On a single node dev maschine (8 vproc) with no other workload we found that 4 concurrent streams had been giving the maximum throuput of about 2.5 Y rows per second.

What is the explected scalability rate in respect to increasing number of nodes / vprocs?

 

Reason I am asking is that I run some tests on a 32 vproc and a 128 vproc (both 4 nodes) on AWS.

And the throuput was only increasing to 5 Y rows per second - in both cases. So no increased performance from 32 to 128 vprocs.

Is there anything I can do to increase the scalibility in this scenario?

Any comment is welcome!

Ulrich

 

 

 

5 REPLIES
Teradata Employee

Re: JDBC scalability

You are doing single row inserts?

Are you using PreparedStatements with question mark parameter markers for all data values that differ from one inserted row to the next (recommended), or are you specifying the inserted data values as literals (not recommended) ?

How are you managing transactions? Are you inserting rows with auto-commit turned on, so each row insert is its own transaction? Or do you have auto-commit turned off and you are executing multiple inserts per transaction?

Senior Supporter

Re: JDBC scalability

Hi Tom,

sorry for the delay in responding.

 

You are doing single row inserts?

Yes, no batch

 

Are you using PreparedStatements with question mark parameter markers for all data values that differ from one inserted row to the next (recommended), or are you specifying the inserted data values as literals (not recommended) ?

Yes, one prepared statement with ? and using .setBytes, .setNull etc. statements to set the specific values before executing the SQL.

 

How are you managing transactions? Are you inserting rows with auto-commit turned on, so each row insert is its own transaction? Or do you have auto-commit turned off and you are executing multiple inserts per transaction?
Auto-commit is on. So from DB side every SQL call is one transaction. The muti statement transaction handling is handled in the Java code (e.g. if after 3 out of 4 SQL an error occured on DB side the two first successfully added rows need to be deleted).
This approach would minimize the risk of dead locks in comparison to but all 4 statements into a single DB transaction but run multiple streams of this process in parallel.

 
 
Tags (0)
 
Teradata Employee

Re: JDBC scalability

Batch inserts are faster than single-row inserts. Perhaps you should consider using a single process and a single session for loading the data, in order to avoid deadlocks, but use batch inserts containing many rows (thousands of rows per batch), use auto-commit=off, and commit periodically.

Senior Supporter

Re: JDBC scalability

we tested the bacth inserts and performance is much better but it has he RI issue which I was able to solve with the batch approach. Also a single load stream would raise the question what to do in case the throughput would not be sufficient.

As mentioned the original question was more what is would be the expected scaling expectation if a JDBC process loads a table and X processes run in parallel. It looks like the scalling factor is not X and also doesn't increase much by adding vprocs.

Teradata Employee

Re: JDBC scalability

You mentioned RI constraints that the application manages, and you indicated that these RI constraints are somehow preventing you from using JDBC batch inserts.

 

Generally speaking, that should not be a problem. It should be possible to use JDBC batch inserts to insert into tables with RI constraints. You need to insert rows with the primary key values before you insert dependent rows with foreign key values.