handling an error during acquisition phase

Tools

handling an error during acquisition phase

good afternoon folk,

i am running into an issue with a tpt script i have written.  my use case is this:  i am ingesting data from sql server to hadoop to edw.  i have successfully gotten the data into hadoop via sqoop jobs (SOH is the separator).  i am now moving those files from hadoop into edw using tpt and bteq scripts.  the high level approach is as follows: hadoop -> edw stage table (which is an all varchar columns) via tpt -> edw core table via bteq which does the trivial ETL functionality of typing the columns to their end data types.  the data in sql server (and hence the hadoop files) is neither standardized, normalized, nor clean at all.  the bulk of the data does come in ok, but there are some rows that have unicode characters in some of the columns.  the char/varchar columns in the edw core tables are defined as "character set latin not case specific", and they are NO PI tables; and this has allowed me to ingest 99.99995% of the data without an issue.  we have received sign off from our qa group that the data is legit and has been turned over for general purpose use.

that last .00005% of the data is what is bothering me though.  i was trying to come up with a method to get it ingested also.  when i look through the _ET tables created and examine the DataParcel column contents, i find that EVERY column that did not get ingested properly dd not come in because of some unicode characters in the data stream.  i also validated this by doing the tpt script into a test column defined as "character set unicode not case specific".  when i rerun the ingestion from the hadoop files to the edw, all rows come in with no errors. so this got me to thinking about an alternative way of ingesting the data; and hence my question.  i create two sets of tables in my tpt script; one is "character set latin not case specific" and the other is "character set unicode not case specific".  the tpt script would do all the prep work as usual, and start to insert the data into the latin character set tables.  when the insert fails, i would look at the errorcode, and determine if it was non zero;  if so i then insert that row into the unicode table.  that way, i would get all of the data ingested; the bulk of it would be in a latin character set (which gives me better performance), but yet i still get the full data set ingested.  i then create a view that does a union between the two tables (Unicode first, then union it to the latin; otherwise i get character conversion errors due to how the set is formed internally within edw apparently) and all seems good.

unfortunately either reality or a lack of creativeness has hit; as i get a compilation error when trying to run that script.  the script is as follows:

 

USING CHARACTER SET UTF8
DEFINE JOB LOAD_TOM_STG
DESCRIPTION 'TPT Job to load EDW_DATABASES.TOM_STG from something'
(

DEFINE SCHEMA SCHEMA_TOM_STG
(
in_mychar VARCHAR(256),
in_myint VARCHAR(256)
);

DEFINE OPERATOR op_TOM_STG
TYPE DATACONNECTOR PRODUCER
SCHEMA SCHEMA_TOM_STG
ATTRIBUTES
(
VARCHAR DirectoryPath = '/var/opt/teradata/edwScripting',
VARCHAR FileName = 'tom.dat',
VARCHAR Format = 'Delimited',
VARCHAR OpenMode = 'Read',
VARCHAR TextDelimiter = '',
VARCHAR RecordErrorVerbosity = 'High',
VARCHAR RowErrFileName = 'Tom-U-ErrorFile.txt'
);

DEFINE OPERATOR od_TOM_STG
TYPE DDL
ATTRIBUTES
(
VARCHAR PrivateLogName = '',
VARCHAR LogonMech = '',
VARCHAR TdpID = 'myserver',
VARCHAR UserName = 'user',
VARCHAR UserPassword = 'something',
VARCHAR ErrorList = ['3807','3803']
);

DEFINE OPERATOR ol_TOM_STG
TYPE LOAD
SCHEMA *
ATTRIBUTES
(
VARCHAR LogonMech = '',
VARCHAR TdpID = 'myserver',
VARCHAR UserName = 'user',
VARCHAR UserPassword = 'something',
VARCHAR LogTable = 'EDW_DATABASES.TOM_STG_LG',
VARCHAR ErrorTable1 = 'EDW_DATABASES.TOM_STG_ET',
VARCHAR ErrorTable2 = 'EDW_DATABASES.TOM_STG_UV',
VARCHAR TargetTable = 'EDW_DATABASES.TOM_STG'
);

STEP stSetupTables
(
APPLY
('DROP TABLE EDW_DATABASES.TOM_STG_LG'),
('DROP TABLE EDW_DATABASES.TOM_STG_ET'),
('DROP TABLE EDW_DATABASES.TOM_STG_UV'),
('DROP TABLE EDW_DATABASES.TOM_STG'),
('DROP TABLE EDW_DATABASES.TOM_STG_U_LG'),
('DROP TABLE EDW_DATABASES.TOM_STG_U_ET'),
('DROP TABLE EDW_DATABASES.TOM_STG_U_UV'),
('CREATE MULTISET TABLE EDW_DATABASES.TOM_STG,
NO FALLBACK,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT,
DEFAULT MERGEBLOCKRATIO
(
mychar VARCHAR(256) CHARACTER SET LATIN NOT CASESPECIFIC,
myint VARCHAR(256) CHARACTER SET LATIN NOT CASESPECIFIC
) NO PRIMARY INDEX;'
),
('CREATE MULTISET TABLE EDW_DATABASES.TOM_STG_U,
NO FALLBACK,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT,
DEFAULT MERGEBLOCKRATIO
(
mychar VARCHAR(256) CHARACTER SET UNICODE NOT CASESPECIFIC,
myint VARCHAR(256) CHARACTER SET UNICODE NOT CASESPECIFIC
) NO PRIMARY INDEX;'
)
TO OPERATOR (od_TOM_STG);
);

STEP stLoadFile
(
APPLY
('INSERT INTO EDW_DATABASES.TOM_STG(
mychar,
myint
) VALUES (
:in_mychar,
:in_myint
);
SELECT CASE when errorcode <> 0 THEN (INSERT INTO EDW_DATABASES.TOM_STG(mychar,myint ) VALUES (:in_mychar, :in_myint)) end
')TO OPERATOR (ol_TOM_STG)
SELECT * FROM OPERATOR(op_TOM_STG);
);

);

 

this line:

SELECT CASE when errorcode <> 0 THEN INSERT INTO EDW_DATABASES.TOM_STG_U(mychar,myint ) VALUES (:in_mychar, :in_myint) end

 

gives me a:

TPT10508: RDBMS error 3706: syntax error, expected something between the 'THEN' and keyword and the 'INSERT' keyword.

 

i have tried various and sundry uses of parenthesis and single quotes in that CASE statement; all generate a similar, but failing operation.  does tpt provide this capability?  i am basically wanting to capture an error in the acquisition phase, and handle it in an alternative method.  i understand that the _ET table will still be written, but can i add any additional error handling?  i can provide the full log if needed, but hopefully the above provides the gist of the issue.  

 

lengthy post i know, but i wanted to provide as much detail as i could. 

1 REPLY
Teradata Employee

Re: handling an error during acquisition phase

TPT does not support what you are trying to do here.

TPT syntax does not allow a SELECT-INSERT.

And I am not sure where you are getting "errorcode" from.

TPT syntax does allow:

CASE

   WHEN abc <> 0 THEN

      APPLY . . . .

   ELSE

      APPLY

. . . . .

 

But the variable "abc" must be a known value, and usually based on one of the column values.

There is no method of determining whether one APPLY fails so that we can then insert the row into a different table.

 

-- SteveF