Is batch processing even possible with CLOB?

Teradata Applications

Is batch processing even possible with CLOB?

I want to copy CLOB from one database to another using JDBC. Here is the code in Java 

 

 

public class TestClob {

public void test() throws ClassNotFoundException, SQLException, IOException { 

Connection conn1, conn2; 
conn1 = DriverManager.getConnection(..., user, pass);
conn2 = DriverManager.getConnection(..., user, pass); 

Statement select = conn1.createStatement(); 
ResultSet rs = select.executeQuery("SELECT TOP 100 myClob FROM myTab " );
int batchSize = 20; 
int rowNumber = 0; 
int totalRowNumber = 0; 

PreparedStatement ps = null; 
Clob clob = null; 
Reader clobReader = null; 

while (rs.next()) { 
rowNumber++; 
totalRowNumber++; 
System.out.println(totalRowNumber);
clob = rs.getClob(1); 
clobReader = clob.getCharacterStream(); 
ps = conn2.prepareStatement("INSERT INTO myTab2 (myClob2) values (?) ");
ps.setCharacterStream(1, clobReader , clob.length() ); 
ps.addBatch(); 

if (rowNumber % batchSize == 0) { 
ps.executeBatch(); 
clob.free(); 
clobReader.close(); 
rowNumber = 0; 
} 
} 

if (rowNumber > 0) { 
ps.executeBatch(); 
clob.free(); 
clobReader.close(); 
} 

conn2.commit(); 
ps.close(); 
select.close(); 
rs.close(); 

 

It compiles and works, however copies only one row per batch. I think the reason is because the Reader object gets overwritten, and the executeBatch method proccess only the row that was read the last

On the other hand I cannot have too many Reader objects open simultaneously because the Teradata LOB object have a limit of 16 

___________________________________________________________________________________
THIS IS ONE WORK AROUND:  to make it work is to use execute method without batch processing 

 

 

while (rs.next()) { 
totalRowNumber++; 
System.out.println(totalRowNumber);
clob = rs.getClob(1); 
clobReader = clob.getCharacterStream(); 
ps = conn2.prepareStatement("INSERT INTO myTab2 (myClob2) values (?) ");
ps.setCharacterStream(1, clobReader , clob.length() ); 
ps.execute(); // HERE I just execute the current row and FREE the CLOB and READER objects 
clob.free(); 
clobReader.close(); 
} 
   

The above method works but it's very slow. Given I have around 2 M rows in my dataset, this takes almost 15 hours 

 

Is CLOB batch proccessing   possible with JDBC ?  Please help 







}

 

 

 


Accepted Solutions
Teradata Employee

Re: Is batch processing even possible with CLOB?

>>> I thought getCharacterStream() materializes CLOB into Reader object in RAM 

 

No, it does not. Clob.getCharacterStream returns a Reader that fetches a chunk of data at a time from the Teradata Database. The Reader does not read all the CLOB data from the Teradata Database at once. Keep in mind that a CLOB can be up to 2GB in length. It would be bad if the Reader fetched the entire CLOB value all at once.

 

 

>>> isn't that how I'm supposed to materialize the CLOB?

 

No, your application must fully read all the CLOB data from the Reader, and store the CLOB data in memory (such as in a String) or store the LOB data in a file on disk on the client system. Where your application stores the data would depend on the size of the CLOB values. If you have many large CLOB values, all of them 2GB in length, then you may not have enough RAM to store all of them in memory, and you would need to store them in temporary files on disk.

1 ACCEPTED SOLUTION
3 REPLIES
Teradata Employee

Re: Is batch processing even possible with CLOB?

The Teradata Database imposes a limit of 16 open responses per connection. When you exceed that number, you get Teradata Database Error 3130 "Response limit exceeded."

 

When you open an InputStream or a Reader to read data from a LOB, that counts as an open response for the connection.

 

We need to make a distinction between reading the LOB data from the source database versus inserting the LOB data into the destination database.

 

The "Response limit exceeded" error occurs when reading the LOB data from the source database, but there is no equivalent limit for inserting the LOB data into the destination database. In the context of JDBC, "batch processing" corresponds to a PreparedStatement batch insert, using the PreparedStatement.addBatch and PreparedStatement.executeBatch methods. The Teradata Database does impose a limit on the number of inserted rows in a batch, but it is a much higher limit (tens of thousands of rows).

 

What this all means is that Yes, your application can use "batch processing" with LOB values, but if you want your application to insert more than 15 LOB values in a batch, then your application will need to materialize the LOB values on the client system before inserting the values to the destination database.

 

What does it mean to "materialize" a LOB value? It means that your application must obtain an InputStream or Reader from the Blob/Clob object, and fully read all the LOB data, store the LOB data in memory or on disk on the client system, and close the InputStream or Reader, and free the Blob/Clob object.

 

Keep in mind that what you're trying to do was not an intended use case for database LOB values, or for the JDBC Blob/Clob objects. You are trying to copy LOB values from one database to another, but database LOB values and JDBC Blob/Clob objects are implemented with "LOB locators", which are designed to enable efficient copying of LOB values within a single database only.

Re: Is batch processing even possible with CLOB?

Thank you very much. But I thought getCharacterStream() materializes CLOB into Reader object in RAM 

 

This is waht I tried 

 

 

	 int batchSize = 50; 
					 int rowNumber = 0; 
					 int totalRowNumber = 0; 
					 int c; 
					 
					 PreparedStatement ps = null; 
					 Clob   clob = null; 
					 Reader  clobReader = null; 
					 				 
					 while (rs.next()) { 
						
						 clob = rs.getClob(1); 
						 clobReader = clob.getCharacterStream(); // Materialized 
						 long clobLength = clob.length(); 

						 System.out.print("Materialized: ") ;  
						 while ((c = clobReader.read()) > -1) { 
							 System.out.print(c); 
						 } 
						 
						 
 						 clob.free(); // Releasing the source database LOB 
						 clob = null; 
						 System.out.println();

						 
						 ps = conn2.prepareStatement("INSERT INTO ...    values (?) ");
						 ps.setCharacterStream(1, clobReader  , clobLength ); 
 	 				     ps.addBatch(); 
	 				     
 	 				     rowNumber++; 
 	 					 totalRowNumber++; 
 	 					 System.out.println(totalRowNumber);
 	 				
 	 								 
	 				     if (rowNumber % batchSize == 0 && rowNumber != 0) { 
	 				     ps.executeBatch(); 
	 				     conn2.commit(); 
 	 				     rowNumber = 0;  
	 				     System.out.println("executing");
	 				     }  

 However this still results in  Response limit exceeded . Presumably because of the Reader objects are open. But isn't that how I'm supposed to materialize the CLOB

 

Teradata Employee

Re: Is batch processing even possible with CLOB?

>>> I thought getCharacterStream() materializes CLOB into Reader object in RAM 

 

No, it does not. Clob.getCharacterStream returns a Reader that fetches a chunk of data at a time from the Teradata Database. The Reader does not read all the CLOB data from the Teradata Database at once. Keep in mind that a CLOB can be up to 2GB in length. It would be bad if the Reader fetched the entire CLOB value all at once.

 

 

>>> isn't that how I'm supposed to materialize the CLOB?

 

No, your application must fully read all the CLOB data from the Reader, and store the CLOB data in memory (such as in a String) or store the LOB data in a file on disk on the client system. Where your application stores the data would depend on the size of the CLOB values. If you have many large CLOB values, all of them 2GB in length, then you may not have enough RAM to store all of them in memory, and you would need to store them in temporary files on disk.