I am trying to generalize the loading from 1 table/delimited-file to another table where the source table/file may have different datatypes when compared to the target. eg: source table/file col1 dataype is varchar(8) which stores date in format YYYYMMDD and the equivalent datatype for that in target table will be DATE; Will create metadata entries for the source and target table columns with datatypes and then co-relate to transform them during the cases as varchar to date etc. Please note that we are not using the period, blob/clob datatypes. Based on the types of input (file/table), it will automatically select the most appropriate producer and consumer operators to be used. I am not sure on how to handle the schema for the source and target of different datatypes to have generalized solution and thus seeking your expertise here.
Approach1: Casting on source columns to align them to that of equivalent target column datatypes in the SELECT list of the producer operator definition and then declare the schema for the source to be equivalent of target table definition.
As I am trying to cast the source column datatypes to target column datatypes, I may not be able to use the DEFINE SCHEMA schema_name FROM TABLE DELIMITED
Thus need to use
DEFINE SCHEMA schema_name FROM SELECT * FROM Target_table and then need to cast the source column datatypes to target column datatypes in the select_statement attribute of producer operator definition.
Some of the rows might fail while casting from source column datatype to target column datatype which will be recorded in the log;
When the source is file, how can we align the source column datatype to target column datatype in the producer operator definition..
Approach2: In the APPLY section, cast the source column datatypes to that of target column datatypes something like (when source is a YYYYMMDD varchar filed which holds date; and target is a date datatype)
:col1 (DATE,format ''yyyymmdd'')
The APPLY section may look cluttered.
If the source is file, it might still be handy as we are casting the source to target datatype in APPLY phase.
As the source is untouched, thereby all valid source rows can be read which may not be the case in 1st approach as there is a possibility of failure relative to some of the rows.
I assume the second approach works good when the source is table/file. Can you please share your invaluable thoughts on which might be optimal solution and what are all the pros/cons for the 2nd approach . Is there any significant limitations that might cause issues to apply casting in the APPLY phase. Thank you for your time.
Adding 1 more point that is found through this forum which is "The CAST cannot be applied in the APPLY phase during the LOAD operator usage". To overcome this, need to use the UPDATE operator instead of LOAD.