QueryGrid Teradata to Hadoop Configuration Issues

Teradata Database on AWS
Enthusiast

QueryGrid Teradata to Hadoop Configuration Issues

Hello:

I believe I've installed all the functions in the t2t and t2h directories on Teradata.

I'm using the Teradata Base+ instance running on Amazon AWS (single node)

I can query a remote Teradata system also running on Teradata AWS, but my issue with with the Hadoop system which is "Hortonworks Sandbox with HDP 2.4" running on Microsoft Azure.

CREATE FOREIGN SERVER hadoop2
USING
hosttype('hadoop')
server('55.55.555.55')
port('9083')
hiveport ('10000')
username('hue')
DO IMPORT WITH SYSLIB.LOAD_FROM_HCATALOG,
DO EXPORT WITH SYSLIB.LOAD_TO_HCATALOG;

I can run...

HELP FOREIGN SERVER HADOOP2;

and this will return a list of databases on my Hadoop system, but I can't seem to run anything else.

select claim_id from sql_class.claims@hadoop2;

ERROR [HY000] [Teradata][ODBC Teradata Driver][Teradata Database] in UDF/XSP/UDM SYSLIB.HCATALOG_CONTRACT: SQLSTATE 38001: [TD-SQLH]:ip-172-30-1-111: ip-172-30-1-111: unknown error

SELECT count(*)
FROM SYSLIB.load_from_hcatalog(USING
server('55.55.555.55')
hosttype('hadoop')
port('9083')
username('hive')
dbname('sql_class')
tablename('claims')
columns('*')
) as D1;

ERROR [HY000] [Teradata][ODBC Teradata Driver][Teradata Database] [TblOp] Could not obtain block: BP-595454498-172.16.137.143-1456768655900:blk_1073743177_2361 file=/apps/hive/warehouse/sql_class.db/claims/000000_0.

The table exists and I can query it and return data connecting directly to Hadoop.

I've added the following entry in my hosts file on the Teradata system.

55.55.555.55      sandbox.hortonworks.com sandbox

I can ping sandbox.hortonworks.com from my Teradata system

On the Hadoop side I've added the following configurations to coresite.xml.

<property>
<name>hadoop.proxyuser.tdatuser.groups</name>
<value>users</value>
</property>

<property>
<name>hadoop.proxyuser.tdatuser.hosts</name>
<value>*</value>
</property>

I don't actually have a user name tdatuser created on the Hadoop side. I have one user with the role of Owner and assigned to "Subscription admins".

Any thoughts what could be the issue?  Thanks.


7 REPLIES
Enthusiast

Re: QueryGrid Teradata to Hadoop Configuration Issues

A little farther...

We can get data using the following syntax.

SELECT *
FROM SYSLIB.load_from_hcatalog(USING
hosttype('hadoop')
server('sandbox.hortonworks.com')
port('9083')
username('hive')
dbname('sql_class')
tablename('addresses')
columns('*')
templeton_port('50111')
--hadoop_properties('dfs.client.use.datanode.hostname=true')
hadoop_properties('<dfs.client.use.datanode.hostname=true>,<dfs.datanode.usedatanode.hostname=true>')
)dt;

However, when we try to create a foreign server we get the following error after trying to do the SELECT below:

CREATE FOREIGN SERVER hadoop3
USING
hosttype('hadoop')
server('sandbox.hortonworks.com')
port('9083')
hiveport ('10000')
username('hive')
templeton_port('50111')
hadoop_properties('<dfs.client.use.datanode.hostname=true>,<dfs.datanode.usedatanode.hostname=true>')
DO IMPORT WITH SYSLIB.LOAD_FROM_HCATALOG,
DO EXPORT WITH SYSLIB.LOAD_TO_HCATALOG;

select * from sql_class.claims@hadoop3;

ERROR [HY000] [Teradata][ODBC Teradata Driver][Teradata Database] in UDF/XSP/UDM SYSLIB.HCATALOG_CONTRACT: SQLSTATE 38001: [TD-SQLH]:ip-172-30-1-111: ip-172-30-1-111: unknown error

172.30.1.111 is the private ip of our Teradata Base+ instance running on AWS.

Thanks again. :/


Teradata Employee

Re: QueryGrid Teradata to Hadoop Configuration Issues

What versions of TDBMS, tdsqlh_td, and tdsqlh_hdp are running on the AWS instance? You can get this information by running the following commands on the AWS instance.

# pdepath -i

# rpm -qa | grep sqlh

Also, make sure to have the required ports opened.

Define network settings in the AWS VPC ACL as follows:

  • Add an outbound rule into the AWS VPC ACL to allow the TD AWS instance to connect to the on-premises Hadoop system, if no such rule exists.
  • Add an inbound rule into the AWS VPC ACL to allow the on-prem Hadoop system to connect to the TD AWS instance.

In addition, define network settings in the AWS security group as follows:

  • Add an outbound rule allowing the TD AWS instance to connect to the Hadoop system, if no such rule exists.
  • Add an inbound rule for the TD AWS instance's PTL listen port allowing the Hadoop system to connect and transfer data to the TD AWS instance. Default PTL listen port is 5002.

Use the public IP address of the TD AWS instance and the IP address of the on-prem Hadoop system when creating foreign server objects.

Enthusiast

Re: QueryGrid Teradata to Hadoop Configuration Issues

Thanks for the reply!

This is good advice for those who want to set this up the right way and it makes more sense in hindsight. :P

The whole unknow error thing threw us off and the fact that T2T was working.  We knew something was fishy because the ip addresses being displayed in teh error were private IPs.

We were working in EC2 within their Networking and Security and didn't realize the there is a separate networking area in AWS for VPC and setting ACL inbound and outbound rules there... 

What we did was create our Teradata instances outside the VPC and we were able to communicate T2T and T2H with our Hadoop system running on Microsoft Azure.

This served our purposes which were non-critical.  Thanks again!

Teradata Employee

Re: QueryGrid Teradata to Hadoop Configuration Issues

Hi All, Even i have the same issue while i connect to UDA2.0 and query cloudera table there error is mentioned as below. I am able to push down the query to hadoop and it runs the job there but TD is not able to read the mapreduce output on HDFS.

Executed as Single statement.  Failed [7825 : 38000] in UDF/XSP/UDM SYSLIB.HCATALOG_CONTRACT_CDH5_4_3: SQLSTATE 38001: [TD-SQLH]:Error while compiling statement: FAILED: SemanticException [Error 10001]: Line 1:78 Table not found 'tdsqlh_test' 

Elapsed time = 00:00:01.286 

STATEMENT 1: Select Statement failed. 

Any inputs are appreciated.

Regards,

Pavan 

Teradata Employee

Re: QueryGrid Teradata to Hadoop Configuration Issues

Hi Pavan,

Is this error from executing the query on a Teradata AWS instance? Which instance type is it?

Teradata Employee

Re: QueryGrid Teradata to Hadoop Configuration Issues

No this is from TD 15.10 VM which i use for new UDA 2.0.

Teradata Employee

Re: QueryGrid Teradata to Hadoop Configuration Issues

Pavan,

I don't think you have the same issue as toadrw (original poster) as his problem is related to AWS network settings.

Can you give us more details about your environment? What version of TDMBS, tdsqlh_td, and tdsqlh_hdp? What version of Hadoop?