We all knew TDCH use JDBC connectivity to establish connectivity between Hadoop and Teradata and to move data across these platforms.
My question is using JDBC, if FastLoad operation is interrupted for some reason, it must be restarted from the beginning. This issue is quite frequent when large record set is loaded because Checkpoint feature is not available using JDBC connectivity. Seems even in recent JDBC 15.x release Checkpoint feature is not introduced.
Having this open item, could someone guide me:
1. Can we recommend TDCH (Hadoop-to-Teradata) to move data in regular interval for large IDW system
2. QueryGrid (LOAD_FROM_HCATALOG) - use JDBC connectivity or different mechanism
3. What is the internal latency involved on Teradata while using TDCH/QueryGrid, more specific to AWT usage
You are correct that the Teradata JDBC Driver does not support FastLoad checkpoints when using a TYPE=FASTLOAD connection.
However, that is a separate topic from TDCH's internal.fastload feature supporting FastLoad checkpoints.
TDCH's internal.fastload feature could be enhanced to support FastLoad checkpoints independently of the Teradata JDBC Driver's TYPE=FASTLOAD connections.
If you are a customer, and you have a business need for TDCH's internal.fastload feature to support FastLoad checkpoints, then you should submit a feature request using "Teradata At Your Service".
It would be worthwhile to diagnose why the connections/jobs are failing.
An enhancement request is a great idea but it is not as simple as turning on checkpointing. A whole mechanism to reposition the read side of the process and be able reliably restart without missing records or getting inconsistent results will take some work so it is likely to be a while before the enhancement will be available.
In the meantime, it would be good to diagnose and fix the issue that is interfering with the jobs completing sucessfully so that you can have a reliable mechanism until such time as a possible feature enhancement could be available.
Thanks Tom and Todd for sharing expertise inputs.
Yes, we are Teradata customers will submit an enhancement request to "Teradata At Your Service" for TDCH internal.fastload to support FastLoad checkpoints.
As Todd highlighted it may not be simple and can’t expected the feature available sooner.
I suspect, most likely final handshake (END LOADING) is not happening properly using JDBC connectivity. Many scenarios the tablesize match the export table/data but FastLoad job is not released.
To conclude, should we limit TDCH only for Adhoc data movement and not for stable Extract-Load process on large Integrated Data Warehouse (IDW) systems until Teradata standardize uninterrupted data load.
Is the 1.4.2 release also known for having these issues? We are looking at using this for weekly unloads from Hadoop and loading to Teradata.