Access External Data - Part 2 Add support for other data sources

Connectivity
Connectivity covers the mechanisms for connecting to the Teradata Database, including driver connectivity via JDBC or ODBC.
Teradata Employee

Access External Data - Part 2 Add support for other data sources

The DbLink facility allows Teradata users to access data in external systems. However, as distributed, it only supports accessing data from external Oracle and Teradata systems.

In this article we will look at what is required to add support for other data sources. As it turns out, not much, just some basic Java programming abilities.

The DbLink Server is a critical part of the DbLink System. The DbLink Server has the responsibility of servicing requests from the Table UDF to retrieve data from an external data source. However, it doesn't know much about all the possible nuances of all the possible data sources that you might want to access. So the DbLink Server provides a mechanism in the form of pluggable translator classes that allow you to help it deal with the foreign data sources.

Overview

An important part of the process of transferring data from the external data source to Teradata is to convert the remote system's "things" to Teradata "things". For example, Oracle has a data type ROWID. There is no such data type in Teradata, however an Oracle ROWID can be represented as a CHAR(18) in Teradata. Other examples can be found in SQL Server. SQL Server has BIT and MONEY datatypes, these are not available in Teradata, but could be represented in Teradata using a BYTEINT and DECIMAL datatypes respectively. Other datatypes (e.g. INTEGER and VARCHAR) can typically be transferred without conversion.

While transferring data from the remote system to Teradata, the DbLink Server will ensure that these "unsupported" data types returned from the remote system are converted to a Teradata equivalent. The DbLink Server doesn't actually do the conversion, rather, it delegates the task to a plug in library.

The DbLink Distribution includes a "plug in" library that works for Oracle (I've tested it against Oracle 10g). The source of this plug in is available via one of th downloads attached to this article. We will examine the Oracle plug in in this article. If you need to support more Oracle data types you can use this code as a basis to add your additional data types. Further if you need to add for another data source (e.g. DB/2 or SQL Server or others) you can use this code as starting point to create a new plug in.

If you plan to create a "plug in" to support a new data source technology, you will also need to obtain a suitable JDBC driver for that technology.

Before you start, make sure you have a working DbLink environment. That is, you have installed the Table UDF, setup the DbLink Server, setup the DbLink Client and tested that it is all working.

The Plugin Architecture

As part of its startup, the DbLink Server will add all "jar" files that it finds in a library directory to its CLASSPATH. The DbLink Server does this to allow you to add functionality. As mentioned above, two types of resource are placed into this library directory:

    • JDBC drivers and
    • foreign DBMS translator plug ins.

A plugin is developed as follows:

    1. Create a "Java Class Library" project in your favourite IDE (i.e. a project that produces a JAR without a main method).
    1. Add the DbLinkCommon.jar and TDUtilityLibrary.jar libraries to your project (you will find these in the DbLinkServer/lib directory).
    1. Create a Java class that extends the com.teradata.dblink.common.DbmsHandler abstract class.
    1. Implement the required methods as described below.

Once you have created the plugin, copy the jar into the DbLink server's library directory, restart the DbLink Server and try out your plug in. To make development easier, you might simply specify the location where your IDE creates the JAR as the DBLink Servers library directory (note, you will need to ensure that the relevant JDBC driver is also in this directory). For example, you might specify the library directory as "YourProject/dist".

The DbmsHandler abstract class

The DbmsHandler abstract class requires you to implement four methods:

    • getVersion () - Return the version of your plugin.
    • getTeradataTypeName(int colNo) - Given a column number, return the datatype name that should be used in Teradata.
    • resultSetGetCurrentRow(ResultSet resultSet) - Given the result set from the remote system, generate a ResultRow that can be processed in Teradata (e.g. If the result set contains a ROWID convert it to CHAR(18) for processing in Teradata)

There may be instances where you need to override some of the other methods.

For example, the Oracle plug in must override the getQueryMetaData(Connection connection, String sqlText) method. The getQueryMetaData method determines the ResultSetMetadata without actually running the query. On Teradata and some other DBMS systems, this can be obtained from a PreparedStatement. However, Oracle does not support obtaining the ResultSetMetadata from a prepared statement, so another method must be used (you can see it in the attached sample code). The Oracle specific approach is implemented in the overriden getQueryMetaData method.

The Oracle Translator plugin in detail

Overview

The purpose of the translator plug in is to translate, where necessary, foreign DBMS concepts into Teradata concepts. This primarily involves the conversion of foreign data types not supported on Teradata into something that can be used on Teradata. But there are also some other concepts that may need translation, for example the way in which metadata is obtained for a select query without actually running the query, as is the case when working with some versions of Oracle.

Get Version method

The getVersion() method simply returns an object that may be used to report the version of this plugin. The version number is displayed by the DbLink server when it loads your translator class.

In the Oracle example, the code is as follows:

    @Override
    public Version getVersion() {
        return new Version(1, 1, 0, 1);
    }

The method simply creates and returns a Version object. The four parameters are the major, minor, revision and patch level.

Get Teradata Type Name method

The get Teradata type name method is responsible for returning an appropriate data type for the nominated column.

The DbLink Server will invoke this method when the user references the DbLinkGetMeta Table UDF. Prior to this method being called, the Server will have obtained the structure of the result set (i.e. the ResultSetMetadata) that the remote query will generate. This is achieved by calling the getQueryMetaData method. The getQueryMetaData method stores the metadata in the resultSetMetaData attribute of the DbmsHandler instance (i.e. the class you are extending).

The String returned by this method is displayed in the TDDataType column of the result set generated by the DbLinkGetMeta Table UDF.

An abridged version of the method from the attached code is shown below:

    @Override
    public String getTeradataTypeName(int colNo) throws SQLException {

        int dataType = getResultSetMetaData().getColumnType(colNo);
        String dataTypeName = getResultSetMetaData().getColumnTypeName(colNo);
        if (dataTypeName.equalsIgnoreCase("TIMESTAMPTZ")) {
            dataType = Types.TIMESTAMP;
        }

        int displaySize = getResultSetMetaData().getColumnDisplaySize(colNo);
        int precision = getResultSetMetaData().getPrecision(colNo);
        int scale = getResultSetMetaData().getScale(colNo);
        switch (dataType) {
            case Types.CHAR:
                return String.format("CHAR(%d)", displaySize);
            case Types.VARCHAR:
                return String.format("VARCHAR(%d)", displaySize);
...
            case Types.DATE:
                return "TIMESTAMP(0)";
            case Types.DECIMAL:
                if (scale > 0) {
                    return String.format("DECIMAL(%d)", precision);
                } else {
                    return String.format("DECIMAL(%d, %d)", precision, scale);
                }
            case Types.NUMERIC:
                if (scale == -127) {
                    return ("NUMBER");
                } else if (scale > 0) {
                    return String.format("DECIMAL(%d, %d)", precision, scale);
                } else {
                    return String.format("DECIMAL(%d)", precision);
                }
            case Types.ROWID:
                return "CHAR(18)";
            default:
                return dataTypeName;
        }

    }

In this method, the key metadata attributes of the column are obtained and stored in some local variables. These key attributes include the JDBC data type, the display size (used in character fields), precision and scale (both of which are used in numbers and dates/timestamps).

Following the initialisation, there is a switch statement which is the main part of the function. Starting at the bottom, the default clause simply returns the data type name as returned from the remote system. In most cases this is sufficient as the data type name on the remote system (or more precisely returned from the java.sql.ResultSetMetadata.getColumnTypeName() methods is the same as Teradata. For example on Oracle an INTEGER data type is also an INTEGER data type on Teradata.

However, sometimes there are minor variations. For example, the data type name returned from Oracle when the data type code is java.sql.Types.CHAR will often be CHAR2. The CHAR2 data type is not valid on Teradata, so the first case clause will match and return CHAR(n) as the Teradata equivalent.

Another example is the Oracle data type DATE. Teradata also has a DATE data type, but despite the common data type name, the two are not the same. On Teradata a DATE data type is simply a date (i.e. it only has a year, month and day). The Oracle DATE data type, not only has a date value, but it also has a time component. The resolution of the time component in the Oracle date is down to the second, therefore, the translator will return TIMESTAMP(0) when an Oracle data type of DATE is encountered.

The final example is the the ROWID - not that JDBC defines a datatype for a ROWID. However, Teradata has no such data type, so we will return this as a CHAR(18). We could have used another data type, for example, we might decide to return ROWID's as a BYTE(18). The choice will depend upon what works in Teradata for the foreign data type, how it might be used by the user when the values are returned from the DbLink table UDF and what conversions are possible (as implemented in the Result Set Get Current Row method examined in the next section).

When looking at the full code, the astute observer may notice that Timestamp, CLOB and some other data types are translated even though the data type name is the same on both Oracle and Teradata. This is required because the data type returned by the java.sql.ResultSetMetadata.getColumnTypeName method often does not return the size associated with the data type. That is, if the remote data type is CLOB(5000), the getColumnTypeName method simply returns CLOB. As such the case statement will intercept the CLOB data type and format it with the size of the CLOB for display to the user. This is important so that the user can correctly construct the query that actually retrieves the data from the remote system.

Another thing to be aware of is how the foreign database represents arbitrary precisions. For example on Oracle, there is a NUMBER data type. The NUMBER data type may optionally contain a precision and scale (e.g. amount NUMBER(10,2)). However it is also valid to define a NUMBER without a precision and scale (e.g. amount NUMBER) as well as just a precision (e.g. amount NUMBER(10)). It is important to understand how the foreign JDBC driver represents these variants. Sometimes the scale or precision or display size might be 0 to indicate that the scale or precision has not been specified. Others might return -1 to indicate that the scale or precision has been omitted. Still others might return a "high value" to indicate that it has been omitted. All of these values are returned via the DBLinkGetMeta Table UDF, so all you need to do is create a table on the remote system with all the variations, add an entry to the DbLinkMeta table then access the DbLinkGetMeta function to see what the foreign system will returns. If you leave the DbLinkMeta.HandlerClassName as null, you don't even need to write a translator to see what the foreign DBMS will return (however, you should ignore any TDDataType values returned  from the DbLinkGetMeta Table UDF when specifying null for the DbLinkMeta.HandlerClassName column).

Result Set Get Current Row method

The resultSetGetCurrentRow method is used by the DbLink Server when the user references the DbLink table UDF. It is called after the remote query has completed and data is being transferred from the remote system back to the Teradata.

The DbLink Server calls the resultSetGetCurrentRow method when it needs the next row to be retrieved from the remote system's Result Set. This remote system Result Set is the data that is produced by the remote query.

The value returned by this method is a com.teradata.dblink.common.ResultRow. A ResultRow is basically an array of Object. There is one Object per column returned from the remote query; this matches the number of columns returned via the DbLink Table UDF.

The primary purpose of the resultSetGetCurrentRow method is to convert any foreign data types that are not supported on Teradata into a data type that is supported on Teradata. Note that in this context, the correct data type is the Java data type that the Teradata JDBC data type will convert into the Terdata data type. For example, a Teradata INTEGER is represented by a Java int (or similar - e.g. a java.lang.Integer) and a Teradata VARCHAR is represented by a Java java.lang.String object. A full list of the mapping between Java data types and Teradata data types may be found in the Teradata JDBC driver manual.

In most cases, the data type returned by the foreign DBMS's JDBC driver can be passed directly back to the DbLink Server. For example, an INTEGER in an Oracle result set will be returned by the Oracle JDBC driver as an int (or INTEGER or similar depending upon the get method used). This is the same as Teradata and as such may be passed back unmodified to the DbLink Server.

However, when the foreign data type is not supported by Teradata (e.g. an Oracle ROWID) an appropriate conversion is made in this method. It is critical that the conversion that is applied in this method must align with the conversion proposed in the getTeradataTypeName method. If the conversion in this method, does not align with the conversion proposed in the getTeradataTypeName method, the user will not be able to correctly formulate a query to retrieve data via a DbLink.

Following is an abridged version of the code from the Oracle Translator plugin attached to this article.

    @Override
    public ResultRow resultSetGetCurrentRow(ResultSet resultSet) throws SQLException {
        ResultRow row = new ResultRow(getColumnCount());
        for (int i = 1; i <= getColumnCount(); i++) {
           
            Object o;
            int dataType = getResultSetMetaData().getColumnType(i);
            String dataTypeName = getResultSetMetaData().getColumnTypeName(i);
            System.out.println("DataType " + i + ": " + dataTypeName + " (" + dataType + ")");
            if (dataTypeName.equalsIgnoreCase("TIMESTAMPTZ")) {
                dataType = Types.TIMESTAMP;
            }
            if (dataType == Types.ROWID) {
                    // A ROWID must be exrtacted as String.
                    // Calling getRowID will resultin an abstractMethodError being thrown.
                    // Ref: https://community.oracle.com/thread/503918?start=0&tstart=0
               
                o = resultSet.getString(i);
            } else if (dataType == Types.TIMESTAMP ||
                        dataType == Types.DATE) {
                    // If the datatype is a timestamp, the value must be extracted
                    // with getTimeStamp. If timestamps fall through to the
                    // default getObject, a proprietary Oracle object (oracle.sql.TIMESTAMP)
                    // is returned. This proprietary object can not be
                    // instantiated at the client.
                o = resultSet.getTimestamp(i);
            } else {
                o = resultSet.getObject(i);
            }
            if (resultSet.wasNull()) {
                o = null;
            }

            row.rowData[i - 1] = o;
            if (o != null) {
                System.out.print(o.toString());
            } else {
                System.out.print("null");
            }
            System.out.print(" ");
        }
        System.out.println();
        return row;
    }

}

As can be seen in the above code, the first thing that happens is a new ResultRow is created. The constructor is provided with the column count so that it can correctly size the array. The getColumnCount method is obtained from the remote ResultSetMetadata which the DbLinkServer captures as part of the process that runs the query on the remote system.

The second thing to note is that the next() method is not called. The DbLink Server manages navigation within the result set. You should not call any of the cursor movement functions (e.g. next()) or you will likely lose data.

Finally, the main part of the method simply extracts the column values from the current row one by one and places them in the ResultRow's rowData array. Note that column 1 goes into array index 0, Column 2 goes into array index 1 and so on. In general terms:

    ResultRow.rowData[colNo - 1] = ResultSet.getObject(ColNo);

If you follow the code through, for the most part, the main part of the method simply follows the logic above. The basic variations include some "debug print statements" that help reveal what is going on and checking to see if the value retrieved was a null - if so, set the work object "o" to the null value. 

The only real conversions that are occuring here are the following:

    • Conversion of the ROWID to a String (CHAR/VARCHAR data type) as proposed in the getTeradataTypeName method.
    • Use of the getTimestamp method to retrieve the Oracle Timestamps.

The first conversion (ROWID to String) is, hopefully, obvious. Teradata doesn't support a java.sql.RowId Java data type which is what GetObject() will return. Therefore we must convert it using one of the other "getter" methods, in this case getString().

The second conversion (Timestamps) is less obvious. I bet you are thinking that the Java Timestamp object as returned by the Oracle JDBC should just be the same as the Timestamp that the Teradata JDBC driver can process. And if you are thinking that you would be correct. However, If you comment out the code block starting with "} else if (dataType == Types.TIMESTAMP || " and let the code fall through to the "else" you will find that the Oracle JDBC driver doesn't return a "java.sql.Timestamp". You will only get a "java.sql.Timestamp" (which is what the Teradata driver is expecting) if you call the getTimestamp method on the Oracle result set. If you simply call getObject method on the Oracle resuls set you will get an object of type "oracle.sql.Timestamp" which only Oracle will recognise. If this is passed back to the DbLink Server, it will be relayed back to the Table UDF which will generate a ClassNotFoundException, InstantiationException or something similar.

As for the rest of the data types (Strings, floating point numbers, BCD numbers and integer numbers), the getObject method when executed on the Oracle ResultSet returns an appropriate object type that may be passed directly to the Teradata JDBC driver. 

Other methods

You may notice that the sample also overrides the method getQueryMetadata. This is not an abstract method, so theoretically it does not need to be implemented. However in the case of Oracle it is required.

Here is the code from the overridden getQueryMetadata method.

    @Override
public ResultSetMetaData getQueryMetaData(Connection connection, String sqlText) throws SQLException {
String metaDataSqlText = "select * from (" + sqlText + ") dt where rownum < 1";
Statement stmt = connection.createStatement();
ResultSet rs = stmt.executeQuery(metaDataSqlText);

ResultSetMetaData rsmd = rs.getMetaData();
setResultSetMetaData(rsmd);
return rsmd;

}

This method is called by the DbLinkServer when the user uses the DbLinkMeta Table UDF. You will note that the supplied query is encapsulated in a derived table that selects just the first row of the results. The default method as implemented in the DbLink Sever looks like this:

    public ResultSetMetaData getQueryMetaData(Connection connection, String sqlText)
                throws SQLException {
        PreparedStatement stmt = connection.prepareStatement(sqlText);
       
        ResultSetMetaData rsmd = stmt.getMetaData();
        setResultSetMetaData(rsmd);
        return rsmd;
    }

Note that in the base version (which works for Teradata), the query is simply prepared and the meta data extracted from the prepared statement.

As it turns out, using a Prepared Statement to obtain result set metadata does not work on Oracle (at least it didn't in my Oracle environment). This is where search engines become your friend. A quick search using a popular well known search engine revealed that this "scenario" is well known within the Oracle community. Apparantly, Oracle only works out the result set structure when you run the query and not when you prepare a query. sad Most people suggested using the approach shown in the sample code which limits the results to just the first row (and implied that processing was therefore minimal) to obtain Oracle Result Set Metadata. I am not sure I agree with the assertion that the processing was minimal, but for our environment it was OK, because the data volumes being queried via DbLink were not terribly large and getting the metadata via this method is a "once only" operation per DbLink Query.

The big question is will you need to override this or other methods in the DbmsHandler abstract class? Maybe, maybe not. It will depend upon the nuances of the system (and its JDBC driver) that you are connecting to. My advice is to try the default methods first. If they work for you, then use them, if not override them with a version that does work.

Note that in both of the above examples, the Metadata returned is captured in the resultSetMetadata property (via the setResultSetMetadata method) as well as returned to the caller.

 Deploying your code

Once you have built your code, you will need to deploy it. This is as simple as:

    1. Dropping your JAR (and the necessary JDBC drivers) into the DbLink Servers library path (as specified by the -l option on the command line or the lib property in the configuration file)
    1. Restart the DbLink Server (using the DbLink Client utility).

When the DbLink Server starts, it will scan the directory specified as the library path. Any JAR files it finds will be automatically added to its classpath.

Developing your code

Deploying is all well and good, but first you need to develop it. Compiling your project, copying it to the lib directory, restarting the server et al will rapidly become a pain in the err head (i.e. a headache). You will still have to do some of this stuff, but here is what I did to make life a little easier for myself:

    1. First set up a shell script (or if you must, MS-DOS batch file) that runs bteq, submits a query that invokes the UDF you are testing, quits bteq, then uses the DBLinkClient to stop the DbLink server.
    1. Next setup your "Java Class Library" project so that the DbLinkServer.jar is a referenced library (i.e. include the DbLink Server.jar and all of the other jar's in its lib directory in your "class Library project".
    1. Finally in your project properties specify com.teradata.dblink.server.Server as the class to run when you run your library. Also set up the required command line parameters (e.g. -c dblink.properties) so that the DbLink Server can startup successfully and that the library path is the directory where your new library will be compiled into.

Once you have done all of the above, you can edit your code, then run the project from within your IDE. Once the server is running, use the shell script (or if you must, your MS-DOS batch file) to invoke the UDF and thus call your code.

Make sure that when you create the dblink meta data entry in the DbLinkMeta table, you specify the name of your translator class in the HandlerClassName attribute.

Troubleshooting.

Make sure you have a functioning DbLink environment (part 1 of these articles) before attempting to add a new functionality.

5 REPLIES
Enthusiast

Re: Access External Data - Part 2 Add support for other data sources

Hi Looking for help to get coonection from Terdasta with oracle for this dblink. Please suggest me

Fan

Re: Access External Data - Part 2 Add support for other data sources

Do you know of existing plug in for Microsoft SQL server (third party included)?

Teradata Employee

Re: Access External Data - Part 2 Add support for other data sources

Sorry, no. The only plugins that I am aware of is the oracle one that I wrote.

You have two choices:

1) Use the code attached here and modify it to suit your needs

2) Get someone to do it for you.

I would be happy to do it for you, but you would need to engage me through Teradata to do so *and* provide access to a suitable SQL*Server test environment - preferably a VMware image.

Fan

Re: Access External Data - Part 2 Add support for other data sources

Could you please send private message so we can discuss the details for #2?  Thank you.

Teradata Employee

Re: Access External Data - Part 2 Add support for other data sources

I am not sure how to send an "InMessage" here, but never the less, the discussion for option #2 (get someone else to do it), is simple.

Get them to read the article and follow the example.

If you want me to do it, you can contact me via the email in my profile (which hopefully you can access) or ask your friendly Teradata account manager to connect us (Glenn McCall).