Teradata Connector for Hadoop 1.0.7 now available

Connectivity
Connectivity covers the mechanisms for connecting to the Teradata Database, including driver connectivity via JDBC or ODBC.
Teradata Employee

Teradata Connector for Hadoop 1.0.7 now available

The Teradata Connector for Hadoop (TDCH) provides scalable, high performance bi-directional data movement between the Teradata database system and Hadoop system.

Some new features included in this release include:

  1. Added an access lock option for importing data from Teradata to improve concurrency.  If one chooses to use lock-for-access, the import job will not be blocked by other concurrent accesses against the same table. 
  2. Added the support for importing data into an existing hive partitioned table.
  3. Allow a Hive configuration file path to be specified by the -hiveconf parameter, so the connector can access it in either HDFS or a local file System. This feature would enable users to run hive importor/export jobs on any node of a Hadoop cluster (see section 8.5 of the REAME file for more information).
  4. With Teradata Database Release 14.10, a new split.by.amp import method is supported (see section 7.1(d) of the README file for more information).

Some problems fixed in this release include:

  1. Inappropriate exceptions reported from a query-based import job. Only the split.by.partition method supports a query as an import source. A proper exception will be thrown if a non split.by.partition import job is issued with the "sourcequery" parameter. 
  2. One gets an error when the user account used to start Templeton is different from the user account used by Templeton to run a Connector job.  A time-out issue for large data import jobs.  In the case of a large-size data import, the Teradata database may need a long time to produce the results in a spool table before the subsequent data transfer.  If this exceeds the time-out limitation of a mapper before the data transfer starts, the mapper would be killed. With this fix, the mapper would be kept alive instead. 
  3. A timeout issue for export jobs using internal.fastload. The internal.fastload export method requires synchronization of all mappers at the end of their execution. If one mapper finishes its data transfer earlier than some others, it has to wait for other mappers to complete their work.  If the wait exceeds the time-out of an idle task, the mapper would be killed by its task tracker.  With this fix, that mapper would be kept alive instead. 
  4. Fix the limitation that the user should have authorization to create local directory while executing Hive job on one node without Hive configuration (hive-site.xml) file. Before the bug fixing, the TDCH needs to copy the file from HDFS to local file system. 
  5. Case-sensitivity problems with the following parameters: "-jobtype", "-fileformat", and "-method". With this fix, values of these parameters do not have to be case-sensitive any more.
  6. Incorrect delimiters used by an export job for Hive tables in RCFileFormat. 

Need Help? 

For more detailed information on the Teradata Connector for Hadoop, please see the Tutorial document in the Teradata Connector for Hadoop Now Available article as well as the README file in the appropriate TDCH download packages.  The Tutorial document mainly discusses the TDCH (Command Line Edition).  The download packages are for use on commodity hardware.  For Teradata appliance hardware, it will be distributed with the appliance.  TDCH is supported by Teradata CS in certain situations where the user is a Teradata customer.

For more information about Hadoop Product Management (PM), Teradata employees can go to Teradata Connections Hadoop PM.

3 REPLIES
Enthusiast

Re: Teradata Connector for Hadoop 1.0.7 now available

Thanks for the update, glad to see active work being done on TDCH.

If you are taking suggestions for the next release ...

1. Currently the table loaded cannot be more than 24 charaters due to the six characters added to it _ERR_1 and    

    _ERR_2 for the load jobs, this is a big contraint where there are already tablenames greater than 24 characters.

    Work with JDBC team to provide option to specify error databasename and error tablename for fastload. (important

     to have)

2. Support for queryband for the entire process and in specific for the load and export operators as TASM regulates the    

    number of sessions in most teradata shops. (important to have)

3. Ability to provide path where users can have pre and post load/export SQL. (nice to have)

Teradata Employee

Re: Teradata Connector for Hadoop 1.0.7 now available

I would like to know who is using TDCH, and what stage people are in with respect to their deployment. 

Please send me an email (hau.nguyen@teradata.com) and let me know.  Please indicate customer name if you are a Teradata customer.

Thanks,

-Hau

Teradata Employee

Re: Teradata Connector for Hadoop 1.0.7 now available

There is a new README file that has been uploaded to the appropriate packages that has updates for Sections 2.2, 2.3, and 8.5.