TPT Error loading CSV as Double Quotes

Tools

TPT Error loading CSV as Double Quotes

Hello, I'm loading a set of CSV files, some of which will have an occational double quoted text field, with a header. With this examples below, I keep getting a TPT error: DataConnection Error, Delimited Data Parsing error: Column Length overflow(s) in row 1.

Doing this in TPT it doesn't load anything, and errors with an exit code of 8 with above mentioned error. I backcheked this by setting up a FASTLOAD job, and I was able to make it load the data file, but with an RC=4 error, and it would reject the 1st row, complaining out the 2nd column, yet remove the 1st row from the data set in case of bad data, and it would do the same exact thing. Looking at the raw data, nothing apppears afoul.

I have tried to increase the length of the tables columns to make wide enough varchars for the incoming data, but that didn't work for either TPT or FASTLOAD. I also tried just removing the 1st row, but it keeps failing the same way.

What else can I try to get the TPT to load sucessfully?

Thank you kindly for help and direction.

--Sample of data file:

ID,Login,TaskId,QueId,TaskStartTime,CNumber,Dur,FUp,Upddt,RunDur

62081217,rr0581,7850,0,3/3/2016 12:31:38 PM -06:00,62081217,127.761,False,,127.761

62126271,rr0581,7850,0,3/4/2016 10:57:34 AM -06:00,62126271,353.269,False,,353.269

62394805,pr4033,7850,0,3/10/2016 11:29:19 AM -06:00,62394805,4869.313,False,,4869.313

62396446,rr0581,7850,0,3/10/2016 11:43:54 AM -06:00,62396446,143.903,False,,143.903

62662785,rr0581,7850,0,3/16/2016 1:00:29 PM -05:00,62662785,60.157,False,,60.157

62664082,rr0581,7850,0,3/16/2016 1:14:01 PM -05:00,62664082,122.775,False,,122.775

62665044,rr0581,7850,0,3/16/2016 1:21:57 PM -05:00,62665044,502.476,False,,502.476

-- Table DDL

CREATE TABLE database.table1

,NO FALLBACK

,NO BEFORE JOURNAL

,NO AFTER JOURNAL

(

  ID VARCHAR(8)

, Login VARCHAR(6)

, TaskID VARCHAR(4)

, QueID VARCHAR(5)

, TaskStartTime VARCHAR(30) 

, CNumber VARCHAR(8)

, Dur VARCHAR(10)

, FUp VARCHAR(5)

, UpdDt VARCHAR(30)

, RunDur VARCHAR(10)

)

UNIQUE PRIMARY INDEX (ID)

;

-- TPT Load Script

DEFINE JOB LOAD_TPT_TABLE1_RAW

DESCRIPTION 'LOAD TABLE1 WORK RAW TABLE FROM FLAT FILE'

(

DEFINE SCHEMA FFILESCHEMA

DESCRIPTION 'DATABASE.TABLE1'

(

  ID VARCHAR(8)

, Login VARCHAR(6)

, Task_ID VARCHAR(4)

, QueID VARCHAR(5)

, TaskStartTime varchar(30)  

, CNumber VARCHAR(8)

, Dur VARCHAR(10)

, FUp VARCHAR(5)

, UpdDt VARCHAR(30)

, RunDur VARCHAR(10)

) ;

DEFINE OPERATOR DATACONNECTION

DESCRIPTION 'TPT CONNECTIONS OPERATOR'

TYPE DATACONNECTOR PRODUCER

SCHEMA FFILESCHEMA

ATTRIBUTES

(

  VARCHAR PrivateLogName = 'table1_work_hist_raw.log'

, VARCHAR DirectoryPath  = '**dir path**\' 

, VARCHAR FileName  = '**file_name.csv*'

, VARCHAR Format  = 'Delimited'

, VARCHAR TextDelimiter  = ','

, VARCHAR QuotedData  = 'optional'

, VARCHAR OpenQuoteMark  = '"'

, VARCHAR CloseQuoteMark = '"'

, VARCHAR OpenMode  = 'read'

, VARCHAR SkipRowsEveryFile = 'y'

, Integer SkipRows   = 1

) ;

DEFINE OPERATOR INSERT_START_RAW

DESCRIPTION 'START INSERT OPERATOR'

TYPE INSERTER

SCHEMA *

ATTRIBUTES

(

  VARCHAR PrivateLogName = 'TPT_TABLE1_RAW.log'

, VARCHAR TdpId   = '**sys dns name*'

, VARCHAR UserName  = '**userid*'

, VARCHAR UserPassword  = '**pw*'

, VARCHAR TargetTable  = 'db.table1'

, VARCHAR LogTable  = 'db.table1_log'

, VARCHAR ErrorTable1  = 'db.table1_ERR1'

, VARCHAR ErrorTable2  = 'db.table1_ERR2'

) ;

 APPLY

( 'INSERT INTO db.table1

  (

   :ID

  ,:Login

  ,:TaskID

  ,:QueID

  ,:TaskStartTime 

  ,:CNumber

  ,:Dur

  ,:FUp

  ,:UpdDt

  ,:RunDur

  ) ;

' )

TO OPERATOR (INSERT_START_RAW[8])

SELECT

  ID

, Login

, TaskID

, QueID

, TaskStartTime

, CNumber

, Dur

, FUp

, UpdDt

, RunDur

FROM OPERATOR

  (DATACONNECTION[8]) ;

)

;

Tags (1)
7 REPLIES
Teradata Employee

Re: TPT Error loading CSV as Double Quotes

You have a few options.

If you would like the header file to be read in and sent to the DBS, you will need to increase the size of the offending column.

In other words, if the failure is on the 3rd column and the 3rd column is defined as VARCHAR(4), the failure is occurring because the column name "TaskId" is longer than 4 characters. You can make the column size in the schema VARCHAR(6) and the header row will be read in successfully (assuming the schema sizes for the rest of the columns are correct).

There is no harm in making the VARCHAR sizes larger than they have to be.

If you do not want the header file to be sent to Teradata, you can use the SkipRows attribute for the DataConnector operator and tell the operator to skip one row (the header). Just set the value to 1 for that attribute.

However, the title of this post talks about quoted fields ("some of which will have an occational double quoted text field"), but the rest of your description of the problem (and the sample data) has no fields that are quoted, so it is unclear what problem you are reporting.

-- SteveF

Re: TPT Error loading CSV as Double Quotes

Hi Steve,

I do not want the header loaded, I alreayd do have the skiprows =1 in place. 

Though this data file may not have double quoted text fields, I have it set as Optional', and tested it out both ways, with and without that setting, it seems to be ok (though I stil have error unrelated to double quotes setting). I plan to reuse this to load other files of this data set an those do have occational or optional fields with double quoted text.

I did try to increase the varchar length in the target table, and adjusted to same values in the TPT script to larger than needed lengths, and I would keep getting the same error from TPT. 

Does anything from the data set, to the target table, and the load script look wrong? I'm still stumped as to why I get this load fail at the 1st row. 

Thanks

Teradata Employee

Re: TPT Error loading CSV as Double Quotes

I will be able to look at this tomorrow (I will take your script and run it on my machine).

What version of TPT, and on which platform, are you running?

Unrelated, but an observation:

Why are you using the Inserter operator?

Why are you using 8 instances of the Inserter operator?

How many data files will you be reading?

How many records in each file?

With 8 instances of the DataConnector operator and 8 instances of the Inserter, you are creating 64 data streams of shared memory, which is most likely way more than you need.

-- SteveF

Re: TPT Error loading CSV as Double Quotes

TTU15 and Teradata Platform 14.

I'm using the Inserter operator as this is being used as a insert only, like a FastLoad. If this is not needed, I'd change to something more apropriate. 

8 instances of the Inserter operator was with the idea of increasing the loading bandwidth, and only loading one file at this time. The production file will run into the 10's of thousands daily. 

Same basic story on the DataConnector operator, I can dial it back if only loading one large file.

Teradata Employee

Re: TPT Error loading CSV as Double Quotes

The Inserter operator is like loading with BTEQ.

The Inserter operator sends a row at a time with SQL Inserts.

If you want to emulate FastLoad you need to use the Load operator.

If you are reading from a single file, just use 1 instance of the DC operator.

For the Load operator you should only need 1 instance.

How many sessions are you planning on using?

Please refer to the documentation for the different operators available and what each one does.

-- SteveF

Re: TPT Error loading CSV as Double Quotes

I have change the numbers of 8 instances to 1, changed the operator type from INSERT to LOAD, eliminated the 3 lines about about optional double quotes just in case, and have expanded the target table varchar columns to 255 in length, and the TPT scripts column lengths accordingly. Now I get a similar error: Delimited Data Parsing error: Too Few columns in row 2. The number of columns to load in the flat file is 10, as are the number of columns in the target table, and also matching in the script.

Teradata Employee

Re: TPT Error loading CSV as Double Quotes

For posterity... Like the Inserter, BTEQ uses plain SQL sessions to talk to the database. Unlike the Inserter, BTEQ can send more than one import record for each insert request. The subject of this thread is TPT. Please do not reply to this comment with BTEQ-specific questions. Start a new thread instead.