I was trying to export a large table from Teradata to HDFS as a textfile using TDCH-TPT Interface i was able to export around 3Mil records with any problems but when i am trying to export entire table i.e 10Mil records its failing saying that "TPT_INFRA: TPT02270: Error: Cannot read from Data Stream, status = DataStream Error" i am using TPT-15.10 and TDCH 1.4.3, i am stuck on this for long time
i have seen the same question here http://teradataerror.com/TPT2270-Error-Cannot-read-from-Data-Stream-status--s.html but the remedy says contact global support.
Do you have any sample scripts to invoke TPT's HDFS interface, the reason for using TDCH-TPT interface was because of better parllelism where we have more mappers writing simultaneously to hdfs, is there any issue with scalability of TDCH-TPT interface.?
Reading/writing from/to Hadoop files via HDFS are no different than accessing them from the local file system.
You give us the name and the directory and the format.
The only difference that is mandatory is to provide us with the HadoopHost name in the DataConnector operator's HadoopHost attribute.
This is documented in the TPT Reference Manual.
There should also be examples in the TPT User Guide.
I tried the example in the TPT User Guide but instead of wrting to hdfs it was writing to local eventhough i have set HadoopHost for the Dataconnector operator may be i have missed some other configuration there, only way i was able to get the file to hdfs via TDC-TPT Interface
Hey I was able to make TPT-HDFS Api work there was some issue with my classpath, but i somehow feel tpt-tdch interface is much faster when compared with TPT-HDFS becuase parllelism will be better in tpt-tdch since it runs multiple mappers. Our main goal is to reduce load on teradata while transfering data from teradata to hadoop. what would you suggest.?
I am glad you were able to get the HDFS method working.
As to which method to use, that would be up to you.
We do not run performance comparison tests.
The method that performs the best for you and is the easiest to use would probably be your best solution.
The main goal is to reduce AMP CPU Seconds while exporting the data, if you can point me to some performace tuning parameters of TPT that would be great help for me.