Access External Data - Part 1 The Table UDF

Connectivity
Connectivity covers the mechanisms for connecting to the Teradata Database, including driver connectivity via JDBC or ODBC.
Teradata Employee

Access External Data - Part 1 The Table UDF

Many databases provide the ability for accessing external data via a mechanism via a SELECT query. Examples include Linked Tables in MS-Access and External Tables in Oracle. A similar capability will become available in Teradata 15, it will be known as Query Grid. However users of Teradata prior to 15 are out of luck.

Teradata has provided the foundation for accessing external data via a SELECT query ever since Table UDF's were added in or around Teradata 12. There is even reference in the UDF programmers manuals to just such a facility (but no actual code to show how to do it).

In this article, you can download a working facility I've called DbLink that you can setup and use to access data from external data sources without having to write your own UDF.

If you are using Teradata 15 or later, you should explore using QueryGrid instead of this DbLink facility.

Overview

Currently my DbLink facility supports Teradata and Oracle. However, if you need to do so, it is relatively easy to add support for other technologies (e.g. SQL Server, DB/2, PostgreSQL, MySQL and many more). Adding support for other data source technologies is covered in the second article of this series. If you do add support for other systems and are willing to share, let me know and we can put it up on a "contributors" page.

The following diagram shows a high level view of the DbLink architecture. You should read the "DbLink Setup and User Guide" for more details of the architecture of the DbLink system and additional information describing what you will need before trying to set it up.

DbLink Architecture

The above diagram shows the Teradata communicating with the Foreign DBMS (the remote data source) via a DbLink Gateway. This Gateway is a system that runs a daemon known as the DbLinkServer. The DbLink Gateway / DbLinkServer is a system external to Teradata. It could be a non-TPA node or a Managed Server or some other system that has network connectivity to Teradata.

The DbLink Server must:

    • Have network connectivity to both Teradata and the Foreign DBMS
    • Have Java runtime 1.6 (JRE 1.6 / JRE 6.0) or later installed
    • Open Firewall ports so that the Teradata system can establish a connection to the DbLink Server.

The last point is critical. I have encountered some sites that allow external systems such as the DbLink Gateway to make connections to the Teradata server, but not the reverse.

For the DbLink system to work, the Teradata system needs to be able to establish a connection to the DbLink Gateway. When the user accesses the DbLink Table UDF, the DbLink Table UDF will attempt to communicate with the DbLinkServer (which is running on the DbLink Gateway). The DbLink Server has the responsibility to connect to the remote system and manage the DbLink request for external data. The DbLink server will only do this when the DbLink UDF is invoked by a user query. The only way that the DbLink Server can know that the DbLink UDF has been invoked is when the UDF attempts to communicate with the DBLink server. The default port that the DbLink Server listens to is port 5000, but this is configurable.

DbLink Versions - important please read this

In the downloads, you will notice 4 files. The current version of DbLinks has been split into 2 parts. You will need both parts:

    • The Dblink Server archive - release 1.03.02.03
    • The DbLink Client archive - release 1.03.02.03

Never try to mix and match different versions of DbLink code. Always use all of the software from the same release. That is, the DbLink Server, Table UDF, Client and common modules must all be from a single release of DbLink. If you start getting InvalidClassExceptions and SerialVerUID messages, chances are you've mixed and matched modules from different versions (which you can not do).

Each of the modules within DbLinks have individual version numbers. For example in release 1.03.02.03 of DbLinks, the server is version 1.03.03.02 (which is similar to, but not the same as the release number) and the TableUDF is version 1.01.02.02.

This can get confusing very quickly. To make it easier to diagnose any version incompatibility issues, I've created a matrix of releases and the versions of the individual modules that make up that release. This release matrix can be found in the "DbLink_ReleaseNotes/versionMatrix.html" file, which can be found in the "DbLink client archive". Other files in the "DbLink_ReleaseNotes" directory list features of DbLinks and a history of modifications.

The original DbLink code is still accessible by downloading the "DbLink Software Distribution v1.0 (deprecated)" archive. I recommend that you do not use this deprecated version. There have been numerous bug fixes in the newer version.

Some of the major changes in this new version include:

    • Introduction of log4J (version 2) for server messages.
    • Implementation of a server configuration file (operational parameters are read from a config file as an alternative to entering them on the command line).
    • Improved memory management.
    • Improved network utilisation.
    • New DbLink client functionality.
    • Miscellaneous bug fixes.

Installation

To install the DbLink Software, download the two archives with the same version number and unzip them in the desired location. If possible do this on a Linux/Unix system that has bteq installed as there are some shell scripts that will make it easier to install the UDF and create the DbLink Data model.

You must download both the client and server archives with the same version number. Do not try to mix and match versions, if you do at best it will not work - at worst you might break something.

To complete the setup edit the "envirables.sh" that can be found in the DbLinkDeployScript directory and run the "deploy.sh" script. The envirables.sh file contains detail such as the user ID and password used to create the DbLink Model and install the Table UDF.

There are many entries in the "envirables.sh", you should modify the following variables as indicated in the following table:

Variable Name Value
U

Teradata User ID with execute procedure permissions on SQLJ database, create table and create function privileges. You may also specify a TDPID if required (e.g. U=dbc/user)

P Password for the above user id.
DBNAME The name of the database into which the JAR Files will be transferred and the DBLink functions created.
PROJECT_ROOT The location of the unzipped dbLinkArchive.

Many users will want to put the JAR files, tables and functions in different databases. In this case refer to the setup instructions in the user guide and use the "deploy.sh" script as a starting point.

You will also need to obtain and install an oracle JDBC driver. You can obtain the Oracle JDBC driver here:

Oracle 10g: http://www.oracle.com/technetwork/database/enterprise-edition/jdbc-10201-088211.html

Oracle 11g R1: http://www.oracle.com/technetwork/apps-tech/jdbc-111060-084321.html

Oracle 11g R2: http://www.oracle.com/technetwork/apps-tech/jdbc-112010-090769.html

Once you have downloaded the Oracle JDBC driver, install it into the DbLinkSupport directory created when you decompress the DbLinkDistribution file. Typically all you will need to do is copy the ojdbcNN.jar file (where NN = a one or two digit number) into the DbLinkSupport directoy.

I have tested the DbLink with ojdbc14.jar (Oracle 10).

Note, you can not simply add support for other data sources by dropping the appropriate JDBC driver into the DbLinkSupport directory. For instructions on how to add support for additional data sources you will need to consult the second article in this series.

Testing

Once you have completed the setup, create a configuration file as per the example in the documetation. You will need to substitute appropriate values to allow the DbLink Server to connect to the Teradata system containing the DbLinkMeta table.

Run the DbLink Server interactively in a terminal session. Once the DbLinkServer has successfully connected create a DbLink Meta entry that connects to the same Teradata system and queries a small table. For example if the Db Link Server is connecting to a system named dbctest, you might create a DbLinkMeta record with the following values:

Column Value
Name test01
Description A DbLink test
JdbcDriverClassName com.teradata.jdbc.TeraDriver
ConnectionURL jdbc:teradata://dbctest/
UserId any user id that can query dbc.dbcinfo and the DbLink Table UDF's
pass The matching password for the above userid
HandlerClassName null
RemoteQuery select * from dbc.dbcinfo

You can then test the setup using a query such as the following:

select dt.*
from table(DbLink('dbcinfo:vmhost') returns (
     InfoKey varchar(30)
    ,InfoData varchar(16384)
    )
  ) as dt
;

In the above example, the name of the system running the DbLinkServer is "vmhost". You will need to substitute "vmhost" with your hostname or an IP address of the system running the DbLink Server.

If everything is setup correctly you should see the contents of the DBC.DbcInfo table. You should also be able to see messages showing actvity in the DbLink Server's terminal session.

Troubleshooting

Refer to the user documentation for potential problems and some troubleshooting information.

 

3 REPLIES
Teradata Employee

Re: Access External Data - Part 1 The Table UDF

Today I've posted some updates to the DbLink code. These updates include new functionality, improved functionality, miscellaneous bug fixes and some documentation updates.

The old version of the code is still available in the downloads. However, I would encourage any one interested in this tool to use the new version.

**Important note:** The new version is uploaded in two parts due to a file size constraint (DbLinks is now too big to upload as a single archive). You must download and use both parts. Do not try to mix and match bits from the old version with the new version. Refer to the section "DbLink Versions - important please read this" section above for details.

Enjoy and as per the standard disclaimer, use at your own risk.

Enthusiast

Re: Access External Data - Part 1 The Table UDF

Thanks Sir for update. I am going to use the new version of DbLink that is 1.3.0 for POC and once POC got successfull done then i will go ahead for productions. We had done the POC on DbLink 1.2 and now planning to move on production. But we are trying to do the POC with Version 1.3 and if everything will go with plan then we will go on production with version 1.3. 

Enthusiast

Re: Access External Data - Part 1 The Table UDF

We are trying to compile attached views on amex1 and its failing with error ‘REPLACE VIEW Failed. 9881: Function Dblink called with invalid number or type of parameters’

These views have more than 254 columns in them. Could that be a reason ? Do we have any column limitation for views using dblink function ?

When I reduce number of columns to 254 it works fine, if it’s 254+ columns it doesn’t compile.

Is there any limitations we have?