Load and transform data from a .txt file by TPT

Tools & Utilities
Highlighted
Enthusiast

Load and transform data from a .txt file by TPT

Hi everyone,

 

I want to load data from a .txt file to a database by TPT and apply a simple transformation in the same job. For this task, I am using the following step involving the SQL Inserter and Data Connector operators, but it does not work:

 

STEP LOAD
(
   APPLY
      ('INSERT INTO TABLE ( 
   :FIELD_1,
   :FIELD_2);')
      TO OPERATOR ( SQL_INSERTER [1] )

   SELECT FIELD_1, CASE WHEN FIELD_2 = ''1'' THEN ''2'' ELSE ''3'' END FROM OPERATOR ( FILE_READER [1] );
);

 

where the Data Connector operator is being considered to read the .txt file and the process shows the error "Operator SQL Inserter has explicit input schema. Restricted APPLY statement allows only deferred schema for consumer operators".

 

In this sense, I guess that it is not possible to direct a transformation to a file reader process, so I would like to know the best way to accomplish such a procedure in the same job, if it is possible.

 

Any idea?

 

Thanks in advance.


Accepted Solutions
Teradata Employee

Re: Load and transform data from a .txt file by TPT

Thanks for providing everything :)

 

On my first try, with the same tpt as yours, I've got this error message :

Teradata Parallel Transporter Version 16.20.00.04 64-Bit
TPT_INFRA: Syntax error at or near line 34 of Job Script File 'load_script.tpt':
TPT_INFRA: At "LOAD" missing { REGULAR_IDENTIFIER_ EXTENDED_IDENTIFIER_ EXTENDED_IDENTIFIER_NO_N_ } in Rule: Regular Identifier
Compilation failed due to errors. Execution Plan was not generated.
Job script compilation failed.

Job terminated with status 8.

I guess LOAD is a tpt-reserved word.

So I changed STEP LOAD to STEP s_LOAD in the tpt file.

Then I've got his message :

TPT_INFRA: Semantic error at or near line 41 of Job Script File 'load_script.tpt':
TPT_INFRA: TPT03182: Missing derived column name.
Derived column specification is rejected.
Compilation failed due to errors. Execution Plan was not generated.
Job script compilation failed.

Job terminated with status 8.

So I changed the select statement as follow :

SELECT FIELD_1, CASE WHEN FIELD_2 = '1' THEN '2' ELSE '3' END as FIELD_2 FROM OPERATOR ( FILE_READER [1] );

And it ran perfectly :

select * from TABLE;

 FIELD_1 FIELD_2 
 ------- ------- 
 1       2      
 4       3      
 2       3      
 1       3      

 

 

1 ACCEPTED SOLUTION
11 REPLIES 11
Teradata Employee

Re: Load and transform data from a .txt file by TPT

Hi GeorgeTD,

 

Providing the whole .tpt script, some rows of your file and the target ddl table would help.

 

Wild guess, try this :

STEP LOAD
(
   APPLY
      ('INSERT INTO TABLE VALUES (:FIELD_1, :F2);')
      TO OPERATOR ( SQL_INSERTER [1] )
      SELECT FIELD_1, CASE WHEN FIELD_2 = '1' THEN '2' ELSE '3' END as F2
      FROM OPERATOR ( FILE_READER [1] );
);

Your SQL_Inserter should have "schema *" as a reference.

 

And a question, why inserter and not load operator ? Unless you have very few rows or complex / LOB datatype, the later is much faster.

 

Enthusiast

Re: Load and transform data from a .txt file by TPT

Hi Waldar,

 

I have tried your previous 'step load' but I get the error "TPT_INFRA: TPT03111: Rule: Restricted APPLY Statement".

 

The whole .tpt script takes the following form:

 

DEFINE JOB  LOAD_JOB
(
   DEFINE SCHEMA SCHEMA_1
   (
 "FIELD_1"    VARCHAR(5),
 "FIELD_2"    VARCHAR(5)
   );
DEFINE OPERATOR SQL_INSERTER TYPE INSERTER INPUT SCHEMA SCHEMA_1 ATTRIBUTES ( VARCHAR TraceLevel = 'None', VARCHAR PrivateLogName = 'inserter_log', VARCHAR Tdpid = @IP, VARCHAR UserName = @USER, VARCHAR UserPassword = @PASSWD );
DEFINE OPERATOR FILE_READER TYPE DATACONNECTOR PRODUCER SCHEMA SCHEMA_1 ATTRIBUTES ( VARCHAR DirectoryPath = @DIR_PATH, VARCHAR FileName = @FILE, VARCHAR IndicatorMode = 'N', VARCHAR DateForm = 'AnsiDate', VARCHAR OpenMode = 'Read', VARCHAR Format = 'Delimited', VARCHAR TextDelimiter = ';' );
STEP LOAD ( APPLY ('INSERT INTO TABLE ( :FIELD_1, :FIELD_2);') TO OPERATOR ( SQL_INSERTER [1] ) SELECT FIELD_1, CASE WHEN FIELD_2 = ''1'' THEN ''2'' ELSE ''3'' END FROM OPERATOR ( FILE_READER [1] ); ); );

 

On the other hand, the .ddl of the target table and some rows of my .txt file are:

 

CREATE MULTISET TABLE DB.TABLE ,NO FALLBACK ,
     NO BEFORE JOURNAL,
     NO AFTER JOURNAL,
     CHECKSUM = DEFAULT,
     DEFAULT MERGEBLOCKRATIO
     (
      FIELD_1 VARCHAR(5) CHARACTER SET LATIN NOT CASESPECIFIC NOT NULL,
      FIELD_2 VARCHAR(5) CHARACTER SET LATIN NOT CASESPECIFIC NOT NULL
)
PRIMARY INDEX ( FIELD_1 );

 

1;1   
2;3   
1;5   
4;2   

 

Finally, with respect to the SQL Inserter operator, thank you for your remark. Indeed, I have noted that it takes too much time in these kinds of processes, but I have been using this operator in my .tpt scripts because of its compatibility with referential integrity and foreign keys (although the target table related to this particular issue does not have foreign keys, but I need to use a load operator that preserves these constraints). In this sense, which is the fastest operator compatible with referential integrity and foreign keys? The "Stream Operator"?

 

Thank you very much again.

Teradata Employee

Re: Load and transform data from a .txt file by TPT

Thanks for providing everything :)

 

On my first try, with the same tpt as yours, I've got this error message :

Teradata Parallel Transporter Version 16.20.00.04 64-Bit
TPT_INFRA: Syntax error at or near line 34 of Job Script File 'load_script.tpt':
TPT_INFRA: At "LOAD" missing { REGULAR_IDENTIFIER_ EXTENDED_IDENTIFIER_ EXTENDED_IDENTIFIER_NO_N_ } in Rule: Regular Identifier
Compilation failed due to errors. Execution Plan was not generated.
Job script compilation failed.

Job terminated with status 8.

I guess LOAD is a tpt-reserved word.

So I changed STEP LOAD to STEP s_LOAD in the tpt file.

Then I've got his message :

TPT_INFRA: Semantic error at or near line 41 of Job Script File 'load_script.tpt':
TPT_INFRA: TPT03182: Missing derived column name.
Derived column specification is rejected.
Compilation failed due to errors. Execution Plan was not generated.
Job script compilation failed.

Job terminated with status 8.

So I changed the select statement as follow :

SELECT FIELD_1, CASE WHEN FIELD_2 = '1' THEN '2' ELSE '3' END as FIELD_2 FROM OPERATOR ( FILE_READER [1] );

And it ran perfectly :

select * from TABLE;

 FIELD_1 FIELD_2 
 ------- ------- 
 1       2      
 4       3      
 2       3      
 1       3      

 

 

Teradata Employee

Re: Load and transform data from a .txt file by TPT


Finally, with respect to the SQL Inserter operator, thank you for your remark. Indeed, I have noted that it takes too much time in these kinds of processes, but I have been using this operator in my .tpt scripts because of its compatibility with referential integrity and foreign keys (although the target table related to this particular issue does not have foreign keys, but I need to use a load operator that preserves these constraints). In this sense, which is the fastest operator compatible with referential integrity and foreign keys? The "Stream Operator"?

TPT Stream is here to handle near real time ingestion, so its goals aren't quite the same.

 

Depending on the volume you can use either TPT Update, or the batch process.

The later one consists in inserting into a staging NoPI table with tpt load, then do a insert select into your final table using regular SQL in another step of your TPT job.

For larger volumes this is the fastest way to insert data inside Teradata.

 

Enthusiast

Re: Load and transform data from a .txt file by TPT

Hi Waldar,

 

I have tried your solution and it works completely fine, thank you very much for it.

 

In addition, I have replaced the SQL Inserter operator by the Update operator and it also works perfectly and much faster, so I thank you again for the remark.

 

With respect to the batch process, I have not seen any .tpt example in the TPT user guide. Is there any example available to check how it works? Is it compatible with referential integrity and foreign keys?

 

Thank you again.

Teradata Employee

Re: Load and transform data from a .txt file by TPT

I would have written it this way.

I can't advice you on a good MaxSessions parameter has it's dependant from your environment, file size, server business and so on.

 

/* using character set utf8 /* uncomment if needed */
define job j_Load_Job
(
    set TableNameNopi = 'STG_' || @TableName || '_NOPI';
    
    define schema sch_from_table from table delimited @TableName;
    
    define operator o_ddl
        type ddl
        attributes
        ( varchar       UserName          = @UserName
        , varchar       UserPassword      = @PASSWD
        , varchar       TdpId             = @IP
        , varchar       WorkingDatabase   = @DatabaseName
        , varchar array ErrorList         = ['3803', '3807', '9247']
        )
    ;
    
    define operator o_load_operator
        type load
        schema *
        attributes
        ( varchar UserName        = @UserName
        , varchar UserPassword    = @PASSWD
        , varchar Tdpid           = @IP
        , integer MaxSessions     = @MaxSessions
        , varchar WorkingDatabase = @DatabaseName
        , varchar TargetTable     = @TableNameNopi
        , varchar WildCardInsert  = 'Y'
        )
    ;
   
    define operator o_file_reader
        type DataConnector Producer
        schema sch_from_table
        attributes
        ( varchar DirectoryPath = @DIR_PATH
        , varchar FileName      = @FILE
        , varchar Format        = 'Delimited'
        , varchar IndicatorMode = 'N'
        , varchar TextDelimiter = ';'
        , varchar OpenMode      = 'Read'
        , integer SkipRows      = 0
        )
    ;
    
    step s_Prep_NoPI
    (
        apply ( 'drop table ' || @DatabaseName || '.' || @TableNameNopi || '_ET;' )
            , ( 'drop table ' || @DatabaseName || '.' || @TableNameNopi || '_UV;' )
            , ( 'drop table ' || @DatabaseName || '.' || @TableNameNopi || '_RL;' )
            , ( 'drop table ' || @DatabaseName || '.' || @TableNameNopi || ';'    )
            , ( 'create multiset table ' || @DatabaseName || '.' || @TableNameNopi || ' as '
           ||   @DatabaseName || '.' || @TableName
           ||   ' with no data no primary index;')
        to operator(o_ddl);
    )
    ;
    
    step s_Load_NoPI_Table
    (
        apply ( 'insert into ' || @DatabaseName || '.' || @TableNameNopi || ';' )
        to operator(o_load_operator[1])
        select *
          from operator(o_file_reader[1]);
    )
    ;
    
    step s_Load_Final
    (
        apply ( ' insert into ' || @DatabaseName || '.' || @TableName
            ||  ' select FIELD_1'
            ||  '      , case FIELD_2 when ''1'' then ''2'' else ''3'' end'
            ||  '   from ' || @DatabaseName || '.' || @TableNameNopi || ';' )
        to operator(o_ddl);
    )
    ;
    
);

I like the staging to match the file then implement the logic during the final load.

Enthusiast

Re: Load and transform data from a .txt file by TPT

Thank you very much for the reformulation of the entire job for the batch process and for your remark about the MaxSessions parameter.

 

Just a couple of questions: is the "o_load_operator" compatible with referential integrity and foreign keys? does this operator require that the target table is empty to apply the process? In this sense, I need to select a type of operator compatible with referential integrity and foreign keys, which additionally could load data into an existing and non-empty table in a fast way.

 

Is it possible?

 

Thank you very much again.

Teradata Employee

Re: Load and transform data from a .txt file by TPT

Indeed, TPT Load needs an empty table, no referential integrity, no index beside primary.

That's a two steps loading - I don't count the preparation step.

 

First one use TPT Load into a generated on-the-fly staging table which is both NoPI and empty.

Second step is moving data from this staging to the final. This is plain SQL - It handles everything.

You have to check the data quality here, in between the staging table and the final table.

 

So as a whole, this batch process is compatible with RI / FK / populated table.

 

Enthusiast

Re: Load and transform data from a .txt file by TPT

Now I understand the process, thank you for the explanation.My last question is if the DDL operator involved in the step "s_Load_Final" can achieve such a load process as fast as the update operator or is any other operator available to carry out this process (preserving RI/FK/populated and in a fast way)?