JDBC fastload - OutOfMemory after 20 million addBatch() operations

Connectivity

JDBC fastload - OutOfMemory after 20 million addBatch() operations

hi

I am using Teradata jdbc driver 13.00.00.10, and trying to upload a flat file with 100 million rows to teradata.

I start off with a clean table.

First I tried to iterate through the entire file, do addBatch() for each of the rows, and only in the end do a single executeBatch():

            while ((s = reader.readLine())!=null ){

                String[] columns = StringUtils.split(s, separator);

                for (int j=0; j <columns.length; j++){

                    st.setString(j+1,columns[j]);

                }

                st.addBatch();

                i++;

                if (i % 10000 ==0 ){

                    ULogger.info(this, "imported " + i + " lines.");

                }

            }

            st.executeBatch();

This quickly consumes all the memory for my application.

I set the 9GB XMX, and got OutOfMemory after ~40 million addBatch().

Then I tried to do periodic executeBatch() - iterate throgh the file and every 20 million addBatch() do an executeBatch().

while ((s = reader.readLine())!=null ){

                String[] columns = StringUtils.split(s, separator);

                for (int j=0; j <columns.length; j++){

                    st.setString(j+1,columns[j]);

                }

                st.addBatch();

                i++;

                if (i % 20000000 ==0 ){

                       st.executeBatch();

                       st.clearWarnings();

                }

            }

            st.executeBatch();

In this case, the first executeBatch() succeeded.

However, The second executeBatch() failed, with "errror while beginning FastLoad of database table XXX".

Can anyone explain how I'm supposed to load 100 million rows?

Is there a configuration I am missing (e.g. to tell the driver to push some updates periodically and not hold them in memory)?

Thanks,

A.

1 REPLY
Teradata Employee

Re: JDBC fastload - OutOfMemory after 20 million addBatch() operations

You must have autocommit turned off in order to call executeBatch multiple times for the same JDBC FastLoad operation.

Please read the Teradata JDBC Driver FAQ:

http://developer.teradata.com/connectivity/faq

In particular -- the question/answer that begins with "I have questions about JDBC FastLoad scalability."

Also, if your data is in comma-separated value (CSV) format, then you should consider using the JDBC FastLoad CSV feature to load your data.

http://developer.teradata.com/doc/connectivity/jdbc/reference/current/jdbcug_chapter_2.html#BABFGFA2

JDBC FastLoad CSV is faster and more memory-efficient than JDBC FastLoad.