Smart Loader for Hadoop

Tools
Tools covers the tools and utilities you use to work with Teradata and its supporting ecosystem. You'll find information on everything from the Teradata Eclipse plug-in to load/extract tools.
Highlighted
Teradata Employee

Smart Loader for Hadoop

Teradata Studio provides a Smart Loader for Hadoop feature that allows users to transfer data from Teradata to Hadoop, Hadoop to Teradata, and Hadoop to Aster. When transfering between Teradata and Hadoop, the Hadoop Smart Loader uses the Teradata Connector for Hadoop MapReduce Java classes as the underlying technology for data movement. It requires the HCatalog metadata layer to browse the Hadoop objects and uses Oozie workflow to manage the data transfer.  Currently, the Smart Loader for Hadoop feature in Teradata Studio is certified to use the Teradata Connector for Hadoop (TDCH) version 1.3.4, and the Hortonworks and Cloudera distributions of Hadoop. The Teradata Connector for Hadoop needs to be installed on the Hadoop System.

                  

NOTE: You must have the Teradata Connector for Hadoop (TDCH) installed on your Hadoop system. You can download the TDCH version 1.3.4  on the Developer Exchange Download site. You must also download the Configure Oozie script and run it on your Hadoop system. Refer to the Readme on the Teradata Studio download page for instructions on running the Configure Oozie script.

For Hadoop to Aster data transfers, the Smart Loader for Hadoop uses the Aster Map Reduce Function, load_from_hcatalog. The data transfer is initiated from the Aster Database to remotely access the Hadoop System, via SQL-H, and pull the data across.

With bi-directional data loading, users can easily perform ad hoc data transfer between their Teradata, Aster, and Hadoop systems. The Hadoop Smart Loader can be invoked by drag and drop of a table between the Transfer View and Data Source Explorer or by selecting a table in the Data Source Explorer and choosing the option Data>Export Data... or Data>Load Data.... This will invoke the Data Transfer Wizard for you to select the Source or Destination Type.

               

Create Hadoop Connection Profile.

You can create connections to your Hadoop System using the Connection Profile Wizard. The wizard is invoked from the Data Source Explorer by right clicking on the Database Connections folder or selecting the 'New Connection Profile' button, , from the Data Source Explorer toolbar.

          

There are two options for creating Hadoop Connection Profiles:

  • Hadoop Generic System - The Hadoop Generic System profile supports migrating Hadoop connections from Studio releases prior to Studio 15.10. It is also used to support Cloudera Hadoop connections.  Hadoop Generic System connections are created using the WebHCat protocol to connect and discover database and tables information. Enter the WebHDFS, WebHDFS Port number, and System Username. This connection requires that the ConfigureOozie script is run on the Hadoop System.

             

  • Hadoop Hortonworks - The Hadoop Hortonworks connection profile provides additional options for connecting to Hortonworks Hadoop systems. It is based on the desired functionality between Studio and your Hadoop System: Knox Gateway (Secure connection), TDCH (Teradata data transfers), JDBC (creating and running SQL), or SQL-H (Hadoop to Aster data transfers). Note that the Knox Gateway option also supports JDBC connections. Click next to enter the Host name, Port number, User name, and Password, if required. The TDCH option is equivalent to the Hadoop Generic System connection profile described above and requires the ConfigureOozie script to be run on the Hadoop System.

                        

Once you have your Hadoop connection profile, you can browse the Hadoop database and table objects in the Data Source Explorer.

You can also run HiveQL SQL commands against your Hadoop system if you have configured a Hadoop JDBC or Knox connection profile.

Transfer Tables between Teradata and Hadoop

Before invoking the Hadoop Transfer Wizard (aka Smart Loader for Hadoop), switch to the Data Transfer perspective and choose your Hadoop connection profile in the Transfer View.

There are two ways to invoke the Hadoop Transfer Wizard. One way to transfer a table from Teradata to Hadoop is to drag a Teradata table from the Data Source Explorer and drop it on a Hadoop database in the Transfer View. You can also transfer from Hadoop to Teradata by dragging the table from the Hadoop system in the Transfer View and dropping on a Teradata Database in the DSE.

    

The other way is from the Data Transfer Wizard. Choose the Teradata table from the Data Source Explorer, right click and choose Data>Export.... This will invoke the Data Transfer Wizard. Choose Hadoop as the Destination Type and click Launch. NOTE: You can also choose Data>Load... and Source Type as Hadoop to transfer data from Hadoop to Teradata.

    

This will launch the Hadoop Transfer wizard for you to choose the destination Hadoop system for the transfer. You can transfer the Teradata table as a 'New Table' or the data to an existing table in Hadoop.

    

From either drag and drop or from the Data Transfer Wizard, the Hadoop Transfer wizard will next prompt the user to choose the file options and column mappings. The Hadoop Transfer wizard will attempt to choose Hadoop columns types based on the source Teradata column data types. The user can override the destination column type by selecting a new column type from the drop down list. You can also choose to filter out columns you don't want in the destination Hadoop table by unchecking the column. If you are transfering to an existing Hadoop table, you will need to map the source columns to the destination columns. You will also be given an option to append or replace the table data.

    

Press finish to complete the Hadoop data transfer and submit a Data Transfer job to perform the data copy. As with data load, the status of the Data Transfer job is displayed in the Transfer Progress View. When the job has completed, an entry is placed in the Transfer History View.

Transfer Tables from Hadoop to Aster

There are two ways to invoke the Hadoop to Aster Transfer Wizard. One way is to drag a Hadoop table from the Data Source Explorer and drop it on an Aster Database in the Transfer View. You can also invoke the wizard by selecting the Aster Tables folder in the Data Source Explorer, right click and choosing Aster>Data Transfer... option. This will invoke the Data Transfer Wizard for you to select Hadoop as the Source Type.

    

Press the Launch button to launch the Hadoop Table to Aster Wizard. Choose the Hadoop Connection Profile to locate the database and table to transer. Next it will display the columns and column types of the Hadoop table. You can filter out columns and select whether the column can contain nulls and whether it is unique. The Hadoop Table to Aster Wizard will only create Aster Fact tables.

Press finish to complete the Hadoop data transfer and submit a Data Transfer job to perform the data copy. As with data load, the status of the Data Transfer job is displayed in the Transfer Progress View. When the job has completed, an entry is placed in the Transfer History View.

Hadoop Data Transfer Job

A transfer job is created to transfer the data to and from Teradata and Hadoop. You can view the progress of the transfer job in the Transfer Progress View of the Data Transfer perspective. NOTE: With the Oozie workflow, the status of the job is not available until the job has finished. Once the job is complete, an entry is placed in the Transfer History and displayed in the Transfer History View.

Select the entry in the Transfer History and click on the Show Job Output toolbar button to view the output from the Hadoop job transfer.

                     

Help

Teradata Studio provides Help information. Click on Help>Help Contents in the main toolbar. You can also get additional information on the Hadoop Transfer Wizard by clicking the question mark, '?' at the lower left hand corner of the wizard page.

Conclusion

Teradata Studio Hadoop Smart Loader provides an ad hoc data movement tool to transfer data between Teradata and Hadoop. It provides a point and click GUI where no scripting is required. You can download Teradata Studio and the Teradata Connector for Hadoop on the Teradata Download site. For more information about other Teradata Studio features, refer to the article called Teradata Studio.

90 REPLIES
Teradata Employee

Re: Smart Loader for Hadoop

Excellent step by step instructions.  Easy to follow with great explanations.

Initially I could not locate the Import from Hadoop since I clicked on the table name in the Teradata explorer.  You will only see the Export to Hadoop... from Data tab.  Make sure your mouse is at the Table folder and not individual table name so that the Import from Hadoop... will be seen from Teradata tab.

Happy computing...

Teradata Employee

Re: Smart Loader for Hadoop

Currently the hadoop transfer does not support Kanji enabled Teradata database.  Well, most of my test data are in Unicode and/or Multi-bytes character object name (table, user) and the data contain in the table.  Hope there will be some work around or support for dealing with Unicode data and objects such as Multi-bytes characters.

I will try to use the Teradata Connector for Hadoop by calling the com.teradata.hadoop.tool.TeradataExportTool and com.teradata.hadoop.tool.TeradataImportTool to see if it can handle Multi-bytes object names and data.

Happy learning always...

Teradata Employee

Re: Smart Loader for Hadoop

I have been able to transfer data both ways between Teradata and Hadoop that contain non-ASCII characters. When Teradata Studio sets up its transfer jobs, it uses the CHARSET parameter defined in the Connection Profile in the JDBC URL that the Connector job on Hadoop uses. If the CHARSET is set to the default value of ASCII, the non-ASCII characters will not transfer.

I was not able to transfer a table whose name contains Kanji characters. The Connector's Limitations section says Kanji is not supported.

Teradata Employee

Re: Smart Loader for Hadoop

UTF8 is the CHARSET value that I used.

Teradata Employee

Re: Smart Loader for Hadoop

 Hi,

I am trying to import data using Teradata Studio to TD 13.0 vm instance, followed the steps for install for the connector I was able to connect to Hadoop  default DB  however I get the following error when I import data to TD 13.0 .. appreciate any insight..

Exception in thread "main" java.lang.NoClassDefFoundError: com/teradata/hadoop/tool/TeradataExportTool

Caused by: java.lang.ClassNotFoundException: com.teradata.hadoop.tool.TeradataExportTool

 at java.net.URLClassLoader$1.run(URLClassLoader.java:202)

 at java.security.AccessController.doPrivileged(Native Method)

 at java.net.URLClassLoader.findClass(URLClassLoader.java:190)

 at java.lang.ClassLoader.loadClass(ClassLoader.java:306)

 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)

 at java.lang.ClassLoader.loadClass(ClassLoader.java:247)

Could not find the main class: com.teradata.hadoop.tool.TeradataExportTool.  Program will exit.

Thanks

Venkat




Teradata Employee

Re: Smart Loader for Hadoop

@VVenkat, You need to install the Teradata Connector for Hadoop (TDCH) (Studio Edition). You can download the connector on the Download Page (at http://downloads.teradata.com/download/connectivity/teradata-connector-for-hadoop-studio-edition). The connector is installed on your Hadoop system. Refer to the Readme inside the TDCH. Here are the main steps from the Readme for installing the TDCH on your Hadoop system:

4.2 Installation

 1. After unzipping Teradata Connector for Hadoop tar.gz package, copy the following files into Hadoop's lib directory:

/usr/lib/hadoop/lib

  a) teradata-connector-1.0.6.jar

  b) tdgssconfig.jar

  c) terajdbc4.jar

2. Copy the following files into Hadoop's conf directory:

/etc/hadoop/conf

   a) teradata-export-properties.xml

   b) teradata-import-properties.xml

3. Modify mapred.child.java.opts property value in mapred-site.xml and add /etc/hadoop/conf

   -Djava.security.egd=file:/dev/./urandom

4. Modify the following OS networking values on the name node (recommended - Optional)

   /proc/sys/net/core/somaxconn to 1024

   /proc/sys/net/ipv4/tcp_max_syn_backlog to 2048

Teradata Employee

Re: Smart Loader for Hadoop

Thank you very much that did resolve the above issue as one of the jars was missing, my import exports now run without error however  hang at the mapreduce step as shown below, I have updated the mapred-site.xml as mentioned above..

13/06/18 09:09:14 INFO mapreduce.TeradataInputProcessor: input batch size is 1000013/06/18 09:09:14 INFO mapreduce.TeradataInputProcessor: input number of mappers are 2 13/06/18 09:09:14 INFO mapreduce.TeradataInputProcessor: job setup ends at 1371571754830 13/06/18 09:09:14 INFO mapreduce.TeradataInputProcessor: job setup time is 3s13/06/18 09:09:28 INFO mapred.JobClient: Running job: job_201306180704_000313/06/18 09:09:29 INFO mapred.JobClient: map 0% reduce 0% ..

pl advise

Thanks

V

Teradata Employee

Re: Smart Loader for Hadoop

@VVenkat, So exports to Hadoop work but import from Hadoop does not? What version of Teradata are you importing to?

Teradata Employee

Re: Smart Loader for Hadoop

I am trying to export and import from and to Teradata 13.0 both jobs hang at the above step and do not complete..Should I be runnig against a higher ver of TD?