The following instructions should help you set up and configure the Teradata SQL H capability with your Hadoop instance. First, however, you need to download and start up a Teradata Express image with the SQL H Connector installed. On the Teradata Express for VMware Player, download on of the following two versions.
100 Resolve Hostname conflicts
Teradata SQL-H uses node hostname to resolve network addressing. However, typical systems configured by Teradata Staging will likely have conflicts/duplicate hostnames between Teradata nodes and Hadoop nodes. This conflict must be addressed, regardless if the connectivity is via customer/external LAN or via bynet/infiniband. This section describes the process to change the Hadoop node hostnames. This section may be skipped if there is no conflict in hostname between the Teradata and Hadoop nodes.
START OF SYSTEM OUTAGE. Note date and time: ___________________
100.1 Shutdown all Hadoop services. On the master node:
# hcli system stop
100.2 Change the hostname.
100.2.1 On the Hadoop master node, append the new hostname to each Hadoop node to the existing entry in the /etc/hosts file. Example:
Before: 18.104.22.168 byn002-8
After: 22.214.171.124 hdp002-8 byn002-8
Note it is mandatory that the new hostname must precede (to the left of) the original hostname.
100.2.2 Validate the new hostname is recognized. On the master node:
# hostname –f”
100.3 On the Hadoop Master node, run the following command:
# hcli support modifyHostname <oldname> <newname>
100.4 If Virtual IP addressing is used at the site, perform the necessary modifications for the new hostnames.
100.5 Make sure the Teradata system can resolve Hadoop node names (either through DNS or via /etc/hosts on the Teradata nodes).
100.6 Start Hadopp services. On the master node:
# hcli system start
END OF SYSTEM OUTAGE. Note date and time: ___________________
101 Add Hadoop IPAs to Teradata host file
Perform the following steps if Teradata hostnames and Hadoop hostnames are not conflicting and are not resolved through local DNS.
101.1 Save a copy of /etc/hosts file on all Teradata TPA nodes:
# cp /etc/hosts /etc/orig.hosts
101.2 Add Hadoop nodes to the /etc/hosts file of every Teradata TPA node. Example:
102 Hadoop Namenode configuration for Teradata proxy user setup
In order for Teradata SQL-H to work with aHadoop system, a Teradata super user must be configured on the hadoop namenode to be allowed to access HDFS from Teradata nodes on behalf of another Hadoop user in a secured way. The Teradata super user used for this setting is tdatuser. On the hadoop side, the following configurations are required in core-site.xml to add ‘tdatuser’ as trusted super user. The first property determines the file system groups that tdatuser is allowed to impersonate. The second property hadoop.proxyuser.tdatuser.hosts determines the hosts from where the tdatuser user is allowed to access the HDFS. If these configurations are not present, impersonation will not be allowed and Teradata queries will fail with a security error.
Add the following two properties to the hadoop namenode configuration file /etc/hadoop/conf/core-site.xml and change their values based on your Teradata/Hadoop environment setup, save the file and restart namenode..
Allow the superuser tdatuser to impersonate any members of HDFS group(s). For example, ‘users’ is used as HDFS group that tdatuser is allowed to impersonate users belonged to this group.
The superuser can connect only from host1 and host2 to impersonate a user. Here host1 and host2 represents Teradata nodes. All nodes of the Teradata system need to be listed here in order for SQL-H query to be processed. It is recommended to use the IP addresses of the Teradata nodes.
103 Grant user privileges
Certain privileges are required for the Teradata database user to run the setup successfully, and if needed, they can be granted by the admin user using the following statements:
GRANT EXECUTE PROCEDURE ON SQLJ TO [user] WITH GRANT OPTION;
GRANT CREATE EXTERNAL PROCEDURE ON SYSLIB TO [user] WITH GRANT OPTION;
GRANT DROP PROCEDURE ON SYSLIB TO [user] WITH GRANT OPTION;
GRANT DROP FUNCTION ON SYSLIB TO [user] WITH GRANT OPTION;
GRANT EXECUTE FUNCTION ON SYSLIB TO [user] WITH GRANT OPTION;
GRANT CREATE FUNCTION ON SYSLIB TO [user] WITH GRANT OPTION;
GRANT SELECT ON SYSLIB TO [user] WITH GRANT OPTION;
104 Run setup script
104.1 On a Teradata node, go to the /opt/teradata/sqlh/<version> directory and run the BTEQ setup script (tdsqlh_td.bteq) to do the setup:
# bteq .logon localhost/[user],[password] < tdsqlh_td.bteq > output.txt 2>&1
where [user] and [password] are to be filled in.
104.2 On a Teradata node, go to the /opt/teradata/tdsqlh_hdp/<version> directory and run the BTEQ setup script (tdsqlh_hdp.bteq) to do the setup:
# bteq .logon localhost/[user],[password] < tdsqlh_hdp.bteq > output.txt 2>&1
where [user] and [password] are to be filled in.
104.3 Check the output file (output.txt in the above example) for possible errors. Upon successful execution, the script will install the required JAR files and create the load_from_hcatalog table operator function.
You may run into the following error if there isn’t enough room in SYSLIB database. For installing Hadoop jars require about 19 megabytes of space. So, you need to increase the SYSLIB database size.
call sqlj.install_jar('cj!pig-withouthadoop.jar','pig', 0);
*** Failure 2644 No more room in database SYSLIB.
Running the setup script for the first time may return following SQL Failure messages. These errors are benign and can be ignored safely. Examples:
DROP FUNCTION SYSLIB.load_from_hcatalog;
*** Failure 5589 Function 'load_from_hcatalog' does not exist.
call sqlj.remove_jar('SQLH', 0);
*** Failure 7972 Jar 'SYSLIB.SQLH' does not exist.
*** Warning: 9241 Check output for possible warnings encountered in Installing or Replacing a JAR.
105 Validate SQL-H install
Below steps will help to validate if the Teradata/Hadoop setup is indeed correct and ready for SQL-H queries.
105.1 Create hcatalog table with data:
105.1.1 Download files tdsqlh_example.hive and tdsql_data.csv from directory /opt/teradata/tdsqlh_hdp/<version> on a Teradata node, and copy them to the hadoop namenode under /tmp directrory.
105.1.2 Log into the hadoop namenode and go to /tmp directory.
105.1.3 Change files permissions on the copied files.
# chmod 777 tdsqlh_example.hive tdsql_data.csv
105.1.4 Change user to hive.
# su hive
105.2 Run sample hive script.
# hive < tdsqlh_example.hive
Make sure the script completes and returns row count as 805.
Total MapReduce CPU Time Spent: 4 seconds 580 msec
Time taken: 33.76 seconds
On successful completion, this will create a table tdsqlh_test with 14 columns and populate 805 rows.
105.3 Run SQL-H query to import rows from tdsqlh_test table
105.3.1 Using SQL Assistance or BTEQ, logon to the Teradata system as user dbc. If user dbc wasn’t present then you may choose a different user.
105.3.2 Run the following SQL query with proper hcatalog server address and port and see if it returns count(*) as 805.
server('126.96.36.199') -- Hcatalog server address
port('9083') -- hcatalog server port
templeton_port('50111') -- Web Hcatalog port
) as D1;
Results should come back with:
*** Query completed. One row found. One column returned.
*** Total elapsed time was 2 seconds.
If the above query returns an error instead of row count 805 then it’s possible that the TD/Hadoop setup could be wrong and requires manual trouble shooting to isolate the problem.
I'm trying to get SQL-H running on Teradata Express, however even after following the items above, i have run into a small snag.
When I try to run the SQL-H query above i am getting the following error message.
Failed [7583 : HY000] The secure mode processes had a set up error.
And from the /var/log/messages on the Express VM. So it appears that the UDF secure server is having problems and is preventing the load_from_hcatalog table function from getting executed. Any ideas on how to fix this?