TPT loads from multiple files in shell script

Tools & Utilities
Highlighted
Enthusiast

TPT loads from multiple files in shell script

Hi everyone,

 

I have designed a TPT script to load data from a set of specific .txt files to a table within a database in Teradata and it works fine. The general structure of this script involves the definition of the DDL, File Reader and SQL Inserter operators, in a way that the mentioned load can be easily implemented by the following statements:

 

"APPLY
   ('INSERT INTO ' || @table || ' (  
            :FIELD_1,
            :FIELD_2')
   TO OPERATOR ( SQL_INSERTER [1] )

SELECT * FROM OPERATOR ( FILE_READER_1 [1] )

UNION ALL
SELECT * FROM OPERATOR ( FILE_READER_2 [2] );"

 

where two File Reader operators (FILE_READER_1 and FILE_READER_2) have been previously defined in order to load the file_1.txt and the file_2.txt into the table by applying the statements above.

 

Nevertheless, as I understand it, such a precedure implies the definition of multiple File Reader operators (one for each file to load). Therefore, I would like to implement an alternative procedure provided by a unix shell script to execute a single TPT load job (i.e. with a unique FILE_READER operator) for each file existing in a specific directory (e.g. a variable set of .txt files, which can be named as DATA_YYYYMMDD.txt, within a specific directory).

 

Is there any sample unix script available to achieve such a procedure?

 

Thanks in advance.

 

Best regards,

George


Accepted Solutions
Teradata Employee

Re: TPT loads from multiple files in shell script

The Data Connector operator allows you to wild-card the source Filename, or to indicate that the supplied Filename is in fact a FileList containing the names of the actual data files to be read.

1 ACCEPTED SOLUTION
4 REPLIES 4
Teradata Employee

Re: TPT loads from multiple files in shell script

The Data Connector operator allows you to wild-card the source Filename, or to indicate that the supplied Filename is in fact a FileList containing the names of the actual data files to be read.

Enthusiast

Re: TPT loads from multiple files in shell script

Thank you for the remarks.

 

The specific definition of the Data Connector operator within my .tpt file is:

 

   DEFINE OPERATOR FILE_READER
   TYPE DATACONNECTOR PRODUCER
   SCHEMA SCHEMA_1
   ATTRIBUTES
   (
      VARCHAR DirectoryPath        = @DIR_SOURCE
      VARCHAR FileName             = @FILE_SOURCE,
      VARCHAR IndicatorMode        = 'N',
      VARCHAR DateForm       = 'AnsiDate',
      VARCHAR OpenMode             = 'Read',
      VARCHAR Format               = 'Delimited',
      VARCHAR TextDelimiter        = ';'
   );

 

where the variables "DIR_SOURCE" and "FILE_SOURCE" refer to the directory path and the file name of the specific file to load into the target table, respectively. Hence, for instance, if FILE_SOURCE='FILE_20190315.txt', then this .tpt job loads data into the target table from that specific file.

 

In this sense, following your lines, I have renamed the variable FILE_SOURCE='FILE_*' (instead of FILE_SOURCE='FILE_20190315.txt' alone) of my script_jobvar.txt file and I have easily loaded a list of 'FILE_YYYYMMDD.txt' files with a single .tpt script into a target table.

 

Thank you very much again.

Enthusiast

Re: TPT loads from multiple files in shell script

I am experimenting a new problem with the previous procedure, since I could load data from a large list of 'FILE_YYYYMMDD.txt' files, but it fails when I execute again the process for a different list of files.

 

For example, I have two 'FILE_20190314.txt' and 'FILE_20190315.txt' files within the directory 'C:\directory'. Then I execute my .tpt script to load data into a target table from these 'FILE_20190314.txt' and 'FILE_20190315.txt' files by setting VARCHAR FileName = 'FILE_*' (or, equivalently, FILE_SOURCE='FILE_*', where FILE_SOURCE is a defined variable related to the attribute FileName of the Data Connector operator) and it works completely fine.

 

However, when I delete the previous 'FILE_20190314.txt' and 'FILE_20190315.txt' files and I try to repeat this procedure from different files (e.g. 'FILE_20190316.txt' and 'FILE_20190317.txt' files), then it fails (the system requires the original 'FILE_20190314.txt' and 'FILE_20190315.txt' files to execute the job, apart from the new 'FILE_20190314.txt' and 'FILE_20190315.txt' files):

 

Therefore, the job "remembers" the original sources and it cannot load data from a distinct list of files. However, in the second job, I only want to load 'FILE_20190316.txt' and 'FILE_20190317.txt' files.

 

Any idea?

Enthusiast

Re: TPT loads from multiple files in shell script

I have just solved this issue by introducing the attribute "VARCHAR ArchiveDirectoryPath" within the Data Connector operator, so that the respective 'FILE_20190314.txt' and 'FILE_20190315.txt' files of the first process are moved to this Archive Directory Path and the new 'FILE_20190316.txt' and 'FILE_20190317.txt' files can be loaded appropriately in the second process.

 

Thank you very much again.