TPT load source files with regular expression

Tools & Utilities
Teradata Employee

TPT load source files with regular expression

We are now using TPT and we have a requirement to load multiple files at the same time. In the beginning we use the (*) in the file name but we have a challenge of using the wildcard character (*). Anyway to use TPT load data file contain regular expression such as ABC_AAA_[0-9].dat, ABC_AAA_BBB_[0-9][0-9].dat? 

Here is the example case:

File layout - 1

ABC_AAA_01.dat

ABC_AAA_02.dat

ABC_AAA_03.dat

File layout - 2

ABC_AAA_BBB_01.dat

ABC_AAA_BBB_02.dat

These 2 file layouts are located in the same directory. If we define the FileName = 'ABC_AAA_*.dat' in TPT script, all 5 source files will be considered by TPT to load and it will cause failure because the 2nd layout is different from 1st layout.

Tags (1)
5 REPLIES
Enthusiast

Re: TPT load source files with regular expression

Hi,

Appearantly this regular expression for file names should work, because the * means anything in the file name. Can you please paste the actual error you are facing.

Also consult the following post, hope it will help you.

http://developer.teradata.com/tools/articles/teradata-parallel-transporter-active-and-batch-director...

Khurram
Teradata Employee

Re: TPT load source files with regular expression

TPT does not support "ABC_AAA_[0-9].dat" syntax.

If you want to use the wildcard syntax, all files must adhere to the same layout.

Thus, in this particular scenario, due to the way the files are named, you would need to separate out the files with the different layouts into separate directories.

-- SteveF
Teradata Employee

Re: TPT load source files with regular expression

Another thought: If you are trying to load the data from all 5 files, but you know you need to use 2 different load tasks to accomplish the job, you can use one step to load the files from ABC_AAA_BBB_*.dat, then use a subsequent TPT job step to move those files to an archive directory, then use yet another job step to load the data from ABC_AAA_*.dat.

-- SteveF
Teradata Employee

Re: TPT load source files with regular expression

Thanks Feinholz, we found another alternative solution with using FileList attribute of DataConnector operator and use Shell script to generate the list of real file name from the regular expression.

But I got a problem when include full path together with the file name in the list of file. TPT return error message said file not found.

TPT_DATACONNECTOR_OPERATOR: TPT19404 pmOpen failed. Requested file not found (4)

If I remove full path from the list of file name and run TPT in the same path as data file, TPT will run successfully. 

I do not understand what is wrong when I defined the full path (I test with VMWare, TPT run on Window) as below in the list of file name. Could you please give me an advice how to correct it. 

TPT_DATA_FILE_LIST.txt

E:\90_Temp\Test_Name\TPT_DATA_01.dat

E:\90_Temp\Test_Name\TPT_DATA_02.dat

TPT - Data Connector Operator:

DEFINE OPERATOR TPT_DATACONNECTOR_OPERATOR

TYPE DATACONNECTOR PRODUCER

SCHEMA INPUTFILESCHEMA

ATTRIBUTES

(

IndicatorMode = 'N',

TextDelimiter = '|',

Format = 'delimited',

RowErrFileName = 'E:\90_Temp\BAD_DATA.dat',

VARCHAR PRIVATELOGNAME  = 'DATACONNECTOR_OPERATOR_LOG' ,

VARCHAR DIRECTORYPATH  = 'E:\90_Temp\Test_Name\' ,

VARCHAR FILELIST  = 'Y',

VARCHAR FILENAME  = 'TPT_DATA_FILE_LIST.txt'

);

Re: TPT load source files with regular expression

Hi there

Even if this is my first day in this forum, I have an idea concerning Your question.

The Teradata Parallel Transporter 13.10 User Guide.pdf page 156 reads

"If the pathname that you specify with the FileName attribute (as filename) contains any

embedded pathname syntax (“/ “on a UNIX OS or “\” on Windows), the pathname is

accepted as the entire pathname.

However, if the DirectoryPath attribute is present, the

FileName attribute is ignored
, and a warning message is issued
."

I cannot test it (my first time with TPT, please excuse me), but I would try:

First idea:  Use "\<servername>"  instead of "E:"

Second idea (if 1st idea fails):

Drop Attribute DIRECTORYPATH

and use VARCHAR FILENAME  = '<path>\TPT_DATA_FILE_LIST.txt'

If this doesn't help I am confused too, because "Teradata Parallel Transporter 13.10 Reference.pdf" page 93 reads

"When used with the FileList attribute, filename is expected to contain a list

of names of the files to be processed, each with a full path specification
."

Please excuse me if this doesn't help. By now I have no possibility to test it.

Best regards, Wolfgang.