Connecting to Teradata in R via the teradataR package-Teradata Express VM version

Analytics
Teradata Employee

Connecting to Teradata in R via the teradataR package-Teradata Express VM version

I am adding another article about connecting to Teradata through R by using the teradataR package. The following comes from my experience working on a 64-bit Teradata Express Virtual Machine (TEVM), and complements the existing forum topic at:

http://forums.teradata.com/forum/analytics/connecting-to-teradata-in-r-via-the-teradatar-package

My assumption is that you are working on a 64-bit Teradata Express Virtual Machine SLES11-SP1, which you can obtain from:

http://downloads.teradata.com/download/database/teradata-express/vmware

After you get the TEVM up and running, here is a sequence of steps to get teradataR to work. In the following, I assume you are working from the root user account /root in the TEVM.

1. Install R on the TEVM: I obtained R from the CRAN website (see in particular http://cran.cnr.berkeley.edu, where I downloaded R v.3.0.2). Its Linux version comes in the command-line form. For a GUI/IDE solution on the TEVM, you can pair R with RKWard (http://sourceforge.net/projects/rkward) or with Rcommander (http://socserv.mcmaster.ca/jfox/Misc/Rcmdr) .

2. You will also need the teradataR package that you can find at http://downloads.teradata.com/download/applications/teradata-r . At the time I made the installation, teradataR v.1.0.0 was incompatible with R v.3.x. Following the instructions on the teradataR webpage, I obtained a teradataR version that is compatible with R v.3.x from http://github.com/Teradata/teradataR/tree/master/build .

3. You will first need to set the Java path appropriately. The default Java path set for you in the TEVM probably points to JRE, whereas you need access to the JDK.

a. Open another terminal, go to your root folder /root and open the ".bashrc" file.

b. Comment out the default JAVA_HOME variable by adding the "#" character in the beginning of the line

export JAVA_HOME=/opt/teradata/jvm64/jre6/jre

If the above line is not in your .bashrc file, then proceed with the following step.

c. Double-check the location of the Java SDK on your TEVM: Try going to the folder

/opt/teradata/jvm64/jdk6

The above path should be valid, otherwise search for the SDK path under the teradata folder. Assuming the above path is valid, enter the following line in your .bashrc file:

export JAVA_HOME=/opt/teradata/jvm64/jdk6

d. Log out and log in for the change to take effect.

4. Next, proceed to make the Open Database Connectivity (ODBC) driver visible to R and any other applications that might need it. The TEVM already comes with the appropriate ODBC driver for the Teradata version it carries. Locate the following files (included in the RPM package "tdodbc" on the TEVM):

/opt/teradata/client/15.00/odbc_64/odbc.ini

/opt/teradata/client/15.00/odbc_64/odbcinst.ini

From the Teradata ODBC driver manual: "The odbc.ini file must be copied to each user's home directory and renamed .odbc.ini (hidden file). The user must customize this file to reflect appropriate data source configuration information." Simply copy the above files from their locations to your root home folder "/root". If the above paths are valid, then you would need to type the following:

cp /opt/teradata/client/15.00/odbc_64/odbc.ini ~/.odbc.ini

cp /opt/teradata/client/15.00/odbc_64/odbcinst.ini ~/.odbcinst.ini

5. Invoke the ODBC Data Source Administrator (DSA) utility (in RPM package "unixodbc-qt-gui" on the TEVM) to specify the ODBC Data Source Name (DSN) so that applications can find the drive.

a. Start the utility by typing the command "ODBCConfig" in a terminal. An external window should be launched.

b. Under the "Drivers" tab, push the "Add" button.

c. Specify the driver properties in the new popup window:

   i. Provide a name and/or description in the "Name" and "Description" boxes, respectively.

   ii. Specify the driver locations on the TEVM in the "Driver" and "Driver64" boxes. On my TEVM, these locations were, respectively, as follows:

      /opt/teradata/client/15.00/odbc_32/lib/tdata.so

      /opt/teradata/client/15.00/odbc_64/lib/tdata.so

   iii. I further left empty the "Setup" and "Setup64" boxes. I also specified the numbers 1 for "UsageCount", 15 for "CPTimeout", and 3 for "CPReuse".

   iv. Click on the "V" (ok) button on the top left hand corner of the popup window.

d. Under the "User DSN" tab, click the "Add" button to the driver you just specified.

e. Exit the ODBCConfig utility by clicking on the "OK" button.

Try launching the ODBCConfig utility again to verify that ODBC shows in the list under the "User DSN" tab.

6. Assume you are using the terminal version of R, and that you launch R from within a folder called  /root/rfolder. Bring the teradataR package into this folder and uncompress it (by using the command "tar -zxvf <filename>.tar.gz"). You should now have a "teradataR" folder in /root/rfolder.

7. At the R command line, first install some packages that are needed for teradataR. Issue the following statements, and follow up with any action you might need to take, such as selecting a nearest CRAN mirror to perform the installation.

> install.packages(c("devtools"))

> install.packages(c("DBI"))

> install.packages(c("rJava"))

> install.packages(c("RJDBC"))

> install.packages(c("RODBC"))

8. Now you are ready to install teradataR. Simply load the devtools package first and proceed to install teradataR from the folder. Type at the R command line:

> library(devtools)

> install("teradataR")

9. This is it. From now on, every time you launch R and you wish to connect to Teradata via teradataR, simply type at the R prompt the following sequence of statements:

> library(RODBC)

> library(teradataR)

> tdConnect("<DSN>", uid="<Your_UID>", pwd="<Your_Password>", database="<Database_Name>", dType="ODBC")

where DSN is the name you specified in 5.c.i above. Also pay attention to specify the last argument exactly as spelled. In my tests, omitting the argument (as can be done under Windows OS), or specifying it as lower-case "odbc" produced errors.

1 REPLY
Teradata Employee

Re: Connecting to Teradata in R via the teradataR package-Teradata Express VM version

The above approach appears to function erratically. At least under SLES11-SP1, the unixODBC packages do not play well, and when trying to connect in R with

> tdConnect("<DSN>", uid="<Your_UID>", pwd="<Your_Password>", database="<Database_Name>", dType="ODBC")

a segmentation fault happens, and the following appears:

    *** caught segfault ***
address (nil), cause 'memory not mapped'
Traceback:
1: .Call(C_RODBCGetInfo, attr(stat, "handle_ptr"))
2: odbcDriverConnect(st)
3: tdConnect("testdsn", uid = "ak", pwd = "ak1", database = "ak", dType = "odbc")

I am proposing an alternative way to connect to Teradata in R via teradataR, by using JODBC.

I am assuming the package teradataR v.1.1.0 is installed in your Linux R installation by following the instruction steps 1, 2, 6, 7, and 8 of the main post. Here is how to proceed further:

1. Load the RJDBC library:

> library(RJDBC)
Loading required package: DBI
Loading required package: rJava

If the library loaded well, proceed to step 2. Otherwise, if R produces the following error when loading RJDBC, then continue with the present step 1.

   Error : .onLoad failed in loadNamespace() for 'rJava', details:
call: dyn.load(file, DLLpath = DLLpath, ...)
error: unable to load shared object '/usr/lib64/R/library/rJava/libs/rJava.so':
libjvm.so: cannot open shared object file: No such file or directory
Error: package ‘rJava’ could not be loaded

To resolve the above error, add the following line in your .bashrc file in your home directory:

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/teradata/jvm64/jre7/jre/lib/amd64/server

Log out and log into your Linux account for the change to take effect.

Then start R and load anew the RJDBC library:

> library(RJDBC)
Loading required package: DBI
Loading required package: rJava

No errors should appear this time.

2. Load teradataR:

> library(teradataR)

3. teradataR provides the tdConnect() function to connect to a target system. Skip using tdConnect() and perform explicitly the connection steps as follows.

First, JDBC needs to know where the files "terajdbc4.jar" and "tdgssconfig.jar" are located on your system to connect. Assume we are on a Teradata VM v.15.00.01.01, where the full path to the containing directory of those files is: "/opt/teradata/tdat/tdbms/15.00.01.01/bin". Locate the files accordingly on your system, too. In our example, we issue the following statements at the R prompt:

> drv <- JDBC("com.teradata.jdbc.TeraDriver","/opt/teradata/tdat/tdbms/15.00.01.01/bin/terajdbc4.jar")
> drv <- JDBC("com.teradata.jdbc.TeraDriver","/opt/teradata/tdat/tdbms/15.00.01.01/bin/tdgssconfig.jar")

In our tests on our Teradata VM, the 2 files needed to be specified in the above order for Java errors to be avoided and the connection to work.

4. Followingly, connect to the desired Teradata system through the R function

dbConnect(<driver>, "jdbc:teradata://<SystemName>/database=<TargetDatabase>", "<UID>", "<Password>")

For example, assume <SystemName>=vm1500, <UID>=ak, and <Password>=ak. The "/database=<TargetDatabase>" part above is optional, but assume you would like to connect specifically to <TargetDatabase>=akDB. Then, the call at the R prompt would be as follows:

> tdConnection <- dbConnect(drv,"jdbc:teradata://vm1500/database=akDB", "ak", "ak")

The above statement establishes the connection.