Teradata Express Support for SQL H -- Install Instructions SQL H

Database
The Teradata Database channel includes discussions around advanced Teradata features such as high-performance parallel database technology, the optimizer, mixed workload management solutions, and other related technologies.
Teradata Employee

Teradata Express Support for SQL H -- Install Instructions SQL H

The following instructions should help you set up and configure the Teradata SQL H capability with your Hadoop instance. First, however, you need to download and start up a Teradata Express image with the SQL H Connector installed. On the Teradata Express for VMware Player, download on of the following two versions.

TDE 14.10.03 SLES 10 40GB (w/SQLH)




TDE 14.10.03 SLES11 40GB w/SQLH



100   Resolve Hostname conflicts

Teradata SQL-H uses node hostname to resolve network addressing. However, typical systems configured by Teradata Staging will likely have conflicts/duplicate hostnames between Teradata nodes and Hadoop nodes. This conflict must be addressed, regardless if the connectivity is via customer/external LAN or via bynet/infiniband. This section describes the process to change the Hadoop node hostnames. This section may be skipped if there is no conflict in hostname between the Teradata and Hadoop nodes.

START OF SYSTEM OUTAGE. Note date and time: ___________________

100.1           Shutdown all Hadoop services. On the master node:

# hcli system stop

100.2           Change the hostname.

100.2.1    On the Hadoop master node, append the new hostname to each Hadoop node to the existing entry in the /etc/hosts file. Example:

Before: 39.64.8.2  byn002-8

After:  39.64.8.2  hdp002-8 byn002-8

Note it is mandatory that the new hostname must precede (to the left of) the original hostname.

100.2.2    Validate the new hostname is recognized. On the master node:

# hostname –f”

100.3           On the Hadoop Master node, run the following command:

# hcli support modifyHostname <oldname> <newname>

100.4           If Virtual IP addressing is used at the site, perform the necessary modifications for the new hostnames.

100.5           Make sure the Teradata system can resolve Hadoop node names (either through DNS or via /etc/hosts on the Teradata nodes).

100.6           Start Hadopp services. On the master node:

# hcli system start

END OF SYSTEM OUTAGE. Note date and time: ___________________

101   Add Hadoop IPAs to Teradata host file

Perform the following steps if Teradata hostnames and Hadoop hostnames are not conflicting and are not resolved through local DNS.

101.1           Save a copy of /etc/hosts file on all Teradata TPA nodes:

# cp /etc/hosts /etc/orig.hosts

101.2           Add Hadoop nodes to the /etc/hosts file of every Teradata TPA node. Example:

192.168.135.100  hdp002-8

192.168.135.101  hdp002-9

102   Hadoop Namenode configuration for Teradata proxy user setup

In order for Teradata SQL-H to work with aHadoop system, a Teradata super user must be configured on the hadoop namenode to be allowed to access HDFS from Teradata nodes on behalf of another Hadoop user in a secured way. The Teradata super user used for this setting is tdatuser. On the hadoop side, the following configurations are required in core-site.xml to add ‘tdatuser’ as trusted super user. The first property determines the file system groups that tdatuser is allowed to impersonate. The second property hadoop.proxyuser.tdatuser.hosts determines the hosts from where the tdatuser user is allowed to access the HDFS. If these configurations are not present, impersonation will not be allowed and Teradata queries will fail with a security error.

Add the following two properties to the hadoop namenode configuration file /etc/hadoop/conf/core-site.xml and change their values based on your Teradata/Hadoop environment setup, save the file and restart namenode..

<property>

  <name>hadoop.proxyuser.tdatuser.groups</name>

  <value>users</value>     

  <description>

          Allow the superuser tdatuser to impersonate any members of HDFS group(s). For example, ‘users’ is used as HDFS group that tdatuser is allowed to impersonate users belonged to this group.

  </description>

</property>

<property>

  <name>hadoop.proxyuser.tdatuser.hosts</name>

  <value>host1,host2</value>

  <description>

          The superuser can connect only from host1 and host2 to impersonate a user. Here host1 and host2 represents Teradata nodes. All nodes of the Teradata system need to be listed here in order for SQL-H query to be processed. It is recommended to use the IP addresses of the Teradata nodes.

   </description>

</property>

103   Grant user privileges

Certain privileges are required for the Teradata database user to run the setup successfully, and if needed, they can be granted by the admin user using the following statements:

GRANT EXECUTE PROCEDURE ON SQLJ TO [user] WITH GRANT OPTION;

GRANT CREATE EXTERNAL PROCEDURE ON SYSLIB TO [user] WITH GRANT OPTION;

GRANT DROP PROCEDURE ON SYSLIB TO [user] WITH GRANT OPTION;

GRANT DROP FUNCTION ON SYSLIB TO [user] WITH GRANT OPTION;

GRANT EXECUTE FUNCTION ON SYSLIB TO [user] WITH GRANT OPTION;

GRANT CREATE FUNCTION ON SYSLIB TO [user] WITH GRANT OPTION;

GRANT SELECT ON SYSLIB TO [user] WITH GRANT OPTION;

104   Run setup script

104.1           On a Teradata node, go to the /opt/teradata/sqlh/<version> directory and run the BTEQ setup script (tdsqlh_td.bteq) to do the setup: 

# bteq .logon localhost/[user],[password] < tdsqlh_td.bteq > output.txt 2>&1

where [user] and [password] are to be filled in. 

104.2           On a Teradata node, go to the /opt/teradata/tdsqlh_hdp/<version> directory and run the BTEQ setup script (tdsqlh_hdp.bteq) to do the setup: 

# bteq .logon localhost/[user],[password] < tdsqlh_hdp.bteq > output.txt 2>&1

where [user] and [password] are to be filled in. 

104.3           Check the output file (output.txt in the above example) for possible errors.  Upon successful execution, the script will install the required JAR files and create the load_from_hcatalog table operator function.

Notes:

You may run into the following error if there isn’t enough room in SYSLIB database. For installing Hadoop jars require about 19 megabytes of space. So, you need to increase the SYSLIB database size.

call sqlj.install_jar('cj!pig-withouthadoop.jar','pig', 0);

 *** Failure 2644 No more room in database SYSLIB.

Running the setup script for the first time may return following SQL Failure messages. These errors are benign and can be ignored safely. Examples:

DROP FUNCTION SYSLIB.load_from_hcatalog;

 *** Failure 5589 Function 'load_from_hcatalog' does not exist.

call sqlj.remove_jar('SQLH', 0);

 *** Failure 7972 Jar 'SYSLIB.SQLH' does not exist.

*** Warning: 9241 Check output for possible warnings encountered in Installing or Replacing a JAR.

105   Validate SQL-H install

Below steps will help to validate if the Teradata/Hadoop setup is indeed correct and ready for SQL-H queries.

105.1           Create hcatalog table with data:

105.1.1    Download files tdsqlh_example.hive and tdsql_data.csv from directory /opt/teradata/tdsqlh_hdp/<version> on a Teradata node, and copy them to the hadoop namenode under /tmp directrory.

105.1.2    Log into the hadoop namenode and go to /tmp directory.

105.1.3    Change files permissions on the copied files. 

# chmod 777 tdsqlh_example.hive  tdsql_data.csv

105.1.4    Change user to hive.

# su hive

105.2           Run sample hive script.

# hive < tdsqlh_example.hive 

Make sure the script completes and returns row count as 805.

Total MapReduce CPU Time Spent: 4 seconds 580 msec

OK

805

Time taken: 33.76 seconds

On successful completion, this will create a table tdsqlh_test with 14 columns and populate 805 rows.

105.3           Run SQL-H query to import rows from tdsqlh_test table

105.3.1    Using SQL Assistance or BTEQ, logon to the Teradata system as user dbc. If user dbc wasn’t present then you may choose a different user.

105.3.2    Run the following SQL query with proper hcatalog server address and port and see if it returns count(*) as 805.

SELECT count(*)

FROM SYSLIB.load_from_hcatalog(USING

       server('153.64.29.98')        -- Hcatalog server address

        port('9083')                        -- hcatalog server port

        username('hive')

        dbname('default')

        tablename('tdsqlh_test')

        columns('*')

        templeton_port('50111')         -- Web Hcatalog port

) as D1;

Results should come back with:

 *** Query completed. One row found. One column returned.

 *** Total elapsed time was 2 seconds.

   Count(*)

-----------

        805

If the above query returns an error instead of row count 805 then it’s possible that the TD/Hadoop setup could be wrong and requires manual trouble shooting to isolate the problem.

1 REPLY
Enthusiast

Re: Teradata Express Support for SQL H -- Install Instructions SQL H

I'm trying to get SQL-H running on Teradata Express, however even after following the items above, i have run into a small snag.

When I try to run the SQL-H query above i am getting the following error message.

Failed [7583 : HY000] The secure mode processes had a set up error.

And from the /var/log/messages on the Express VM.  So it appears that the UDF secure server is having problems and is preventing the load_from_hcatalog table function from getting executed.  Any  ideas on how to fix this?

Sep 24 19:07:04 TDExpress15008 Teradata[19788]: INFO: Teradata: 7820 # jsvsrv: Race4HybridServer AppCat 5 NODE 30720 VPROC 16383 got the semaphore

Sep 24 19:07:04 TDExpress15008 Teradata[19788]: INFO: Teradata: 7820 # udfipclib: SASreset semval is 1

Sep 24 19:07:04 TDExpress15008 Teradata[19788]: INFO: Teradata: 7820 # jsvsrv: Vproc 16383 AppCat 5 try to create secure server udfsectsk. user tdatuser passwd NULL path /usr/tdbms/bin/udfsectsk cmd udfsectsk 000000000006 000000262136

Sep 24 19:17:04 TDExpress15008 Teradata[19788]: INFO: Teradata: 7820 # UDFCmdWait: Error returned for semid (9764966), errno=11 (Resource temporarily unavailable)

Sep 24 19:17:04 TDExpress15008 Teradata[19788]: INFO: Teradata: 7820 # jsvsrv: ErrorReturn 0 SrvIndex 0 SecSrvIndex 0 udfsectsk pid 20658  SemId 9764966 Stat -2 errno 0 (Success)

Sep 24 19:17:04 TDExpress15008 Teradata[19788]:  AppCat 5 VPROC 16383 OpCode UDF_OP_INIT_HYBRID Hybrid Server 20658 SEC_XCMD error occured

Sep 24 19:17:04 TDExpress15008 Teradata[19788]: INFO: Teradata: 7820 # jsvsrv: ErrorReturn 0 SrvIndex 0 SecSrvIndex 0 udfsectsk pid 20658  SemId 9764966 Stat -2 errno 0 (Success)

Sep 24 19:17:04 TDExpress15008 Teradata[19788]:  AppCat 5 VPROC 16383 UDF_OP_INIT_HYBRID failed. Signal to kill hybrid server 20658 cmd 1.

Sep 24 19:17:04 TDExpress15008 Teradata[19788]: INFO: Teradata: 7820 # udfipclib: signal SAS shutdown

Sep 24 19:17:04 TDExpress15008 Teradata[19788]: INFO: Teradata: 7820 # jsvsrv: Vproc 16383 Hybrid Server 20658 is in bad condition.

Sep 24 19:17:04 TDExpress15008 Teradata[19788]: INFO: Teradata: 7820 # jsvsrv: AppCat 5 NODE 30720 VPROC 16383 Race4HybridServer failed to start the hybrid server

Sep 24 19:17:04 TDExpress15008 Teradata[19788]: INFO: Teradata: 7820 # jsvsrv: AppCat 5 NODE 30720 VPROC 16383 Race4HybridServer release the semaphore. Hybrid Server start up done.

Sep 24 19:17:04 TDExpress15008 Teradata[19788]: INFO: Teradata: 7820 # jsvsrv: AppCat 5 VPROC 16383 SecSrvIndex 0 Hybrid Server creation failure.

Sep 24 19:17:04 TDExpress15008 Teradata[19788]: INFO: Teradata: 7820 # udfsecsrvtsk: VPROC 16383 CreateNewProcess failed for server index 0.

Sep 24 19:17:04 TDExpress15008 Teradata[19788]: INFO: Teradata: 7820 # Try to delete memfileptr /var/opt/teradata/tdtemp/udfsrv/sec_016383_000.smf