Question Regarding Insert Data from CSV-File

General

Question Regarding Insert Data from CSV-File

Hello All

I've written a small JDBC application to insert data of a CSV-file into a table by the means of a file upload web formular.

The table definition of the table the data gets insert into is the following:

CREATE MULTISET TABLE Main_Number_Upl_MYUSER ,NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT,
DEFAULT MERGEBLOCKRATIO
(
Main_Num VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC)
PRIMARY INDEX ( Main_Num );

The JDBC connection string is: jdbc:teradata://dwhhost.teradata.ch/TMODE=BTET,CHARSET=UTF8

I am using the following piece of code to insert the data from the CSV-file into the above table:

String INSERTTABLESTMT = "INSERT INTO ${table_name} (${keys}) VALUES (${values})";

String query = INSERTTABLESTMT.replaceFirst("\$\{table_name\}, tableName);
query = query.replaceFirst("\$\{keys\}", columnName);
query = query.replaceFirst("\$\{values\}", questionmarks);

String[] nextLine;
PreparedStatement ps = null;
try {
ps = connection.prepareStatement(query);

int count = 0;
while ((nextLine = csvReader.readNext()) != null) {
int index = 1;

count++;
if (count <= batchSize) {

for (String string : nextLine) {
ps.setString(index++, string);
}
ps.addBatch();
}

if ((count % batchSize) == 0) {
ps.executeBatch();
count = 0;
}
}

if ((count % batchSize) != 0) {
ps.executeBatch();
}

There's a different result depending on the character encoding of the source CSV-file: If the file uses the Unicode encoding the result looks like this:

Main_Num

 5 9 2 6 7 9 2 0 3 3 9 1 

 5 9 5 9 1 2 3 4 8 8 9 0

 5 9 3 6 1 7 8 2 1 3 6 0

 5 9 1 2 5 6 1 1 3 8 9 7 

I.e., there are blanks in front and after each digit, there are even whole blank lines. However, if the file uses the ANSI encoding, the result looks as desired:

Main_Num

592679203391

595912348890

593617821360

591256113897

Do you have any ideas/recommendations how to get the desired result independent of the file encoding?

Best regards,

  Christoph

2 REPLIES
Teradata Employee

Re: Question Regarding Insert Data from CSV-File

The problem lies with the csvReader. Your code example doesn't show how the csvReader object is constructed and initialized with the filename.

Your program would need to specify the character encoding of the file when initializing the csvReader, so that the csvReader can correctly read the data from the file.

Re: Question Regarding Insert Data from CSV-File

Thank you for the reply.  The cvsReader object gets constructed as follows:

String SEPARATOR = ",";

...

try {
csvReader = new CSVReader(new FileReader(csvFile), SEPARATOR);
} catch (Exception e) {
e.printStackTrace();
throw new Exception("Error occured while opening file. " + e.getMessage());
}

According to various sources on the web, it's actually not possible to know/determine the character encoding of the file programmatically (http://stackoverflow.com/questions/499010/java-how-to-determine-the-correct-charset-encoding-of-a-st...). There exist, however, various libraries/methods to guess the encoding with a certain precision.

This means that in order to make the application work properly in every situation the users have to use a specific file encoding.

Christoph