R with Teradata ODBC

Blog
The best minds from Teradata, our partners, and customers blog about whatever takes their fancy.
Teradata Employee

1 Introduction


This document is a high-level tutorial that describes how to connect to Teradata Database using ODBC Driver for Teradata using R.

Scripting languages are quickly becoming a common language for the implementation in many areas, especially, where the development time is more critical than the execution time. Moreover, in the areas where the execution time is important, languages and environment are adjusted and optimized to boost performance. Scripting languages make possible a variety of different scenarios and configurations. Increasingly, the languages themselves are used as a full basic instrumental platform.

1.1 Driver Manager

When an application tries to connect to a data source using the connection methods of the ODBC API, the Driver Manager (DM) determines which driver is required and loads it into memory. The Driver Manager then simply takes any incoming function call from the application and calls the function of the same name in the driver. It also performs other functions such as error checking to ensure that function are called in the right order, arguments contain valid values and unloading the driver from memory.

The ODBC driver manager acts as the mediator between the application and the database thereby creating a layer of abstraction. It manages the interactions between application programs and drivers. It has the capability to manage multiple applications and multiple drivers simultaneously. 


There are different Driver Managers which can be used. Some of the most commonly used Driver Managers are:

unixODBC DM: an open source Driver Manager for various platforms

iODBC DM: an open source Driver Manager shipped with Mac OS X

Microsoft ODBC DM: Driver Manager shipped with Microsoft Windows OS 

DataDirect DM: Driver Manager built by DataDirect 

The ODBC Driver for Teradata works with iODBC on Mac OS X and with the Microsoft ODBC Driver Manager on Windows OS. On all other supported platforms, it is shipped with DataDirect’s Driver Manager. 


1.2  References 

1.3  Prerequisite

The components used throughout this tutorial:

Component Version URL
Teradata ODBC Driver  15.10  www.teradata.com/DownloadCenter
R  3.2.0 https://www.r-project.org/ 
RODBC  1.3 - 12 https://cran.r-project.org/web/packages/RODBC/index.html

R programming console can be downloaded from https://www.r-project.org/ for Windows, Unix and Mac OS X platforms. This tutorial is based on the version 3.2.0. You can use a more recent version of R provided a compatible RODBC package is available too.

The most common way to connect to a database with R is by using RODBC package. This package can be easily installed on your machine. Here are the steps:

> install.packages("RODBC")

Once this is installed, you can verify the same by calling the installed.packages() function which will return you the list of packages currently installed.

> installed.packages()

2 Create Sample Data


General workflow of the interaction with the database is very similar to a generic ODBC application workflow, and can be described with the following steps:
  • List all available Data Sources using the odbcDataSources() function.
  • Connect to the database using the odbcConnect() or odbcDriverConnec() function.
  • Run a query using the sqlQuery() function.
  • Clean up the table using the sqlDrop() function.
  • Finalize the work with the database with the odbcClose() functions.

2.1  Connect

First, in order to connect to a database, the RODBC package needs to be loaded with the library() function.

> library(RODBC)

Then the connection has to be established. This can be done by either using the odbConnnect() function or the odbcDriverConnect() function. In the following examples, ch is the channel to which this connection corresponds. It can be thought of as a connection handle. odbcDriverConnect() can be used in case you want to use a DSN-less connection.

> ch <- odbcConnect(dsn="testDSN",uid="testUser",pwd="ter@d@t@")
> ch <- odbcDriverConnect("Driver=Teradata;DBCName=192.0.0.1;UID=testUser;PWD=ter@d@t@")

 

2.2  Execute SQL statement

In order to execute a SQL Query using R, we can simply use the sqlQuery() function. It can execute both types of SQL queries, ones which return a result set and the ones which do not return a result set. Here is an example of using the sqlQuery() function to drop an existing employee table

> sqlQuery(ch, paste('drop table  "employee" '))

Another way to drop a table is to use the sqlDrop() function

> sqlDrop(ch, "employee", errors = FALSE)

2.3  Disconnect

Finally, you can close the channel by using the odbcClose() or odbcCloseAll() functions

> odbcClose(ch)
> odbcCloseAll()

3 Complete Example

# Loading the RODBC Package
> library(RODBC)

# Connecting to the database using a DSN-Less Connection
> ch <- odbcDriverConnect("Driver=Teradata;DBCName=192.168.204.27;UID=odbc;PWD=odbc")

# Drop table
> sqlDrop(ch, "Artists")

# Create the Artists table and insert rows in it
> sqlQuery(ch, paste('Create table "Artists" ("Artist_ID" int, "Artist_Name" varchar(100))' ))
[1] "No Data"
> sqlQuery(ch, paste('Insert into "Artists" values (1, \'John Mayer\')' ))
character(0)
> sqlQuery(ch, paste('Insert into "Artists" values (2, \'Slash\')' ))
character(0)
> sqlQuery(ch, paste('Insert into "Artists" values (3, \'Kurt Cobain\')' ))
character(0)
> sqlQuery(ch, paste('Insert into "Artists" values (4, \'Axl Rose\')' ))
character(0)

# Close the connection/channel
> odbcClose(ch)

 

4  Retrieve Data


To retrieve data you can use the sqlGetResults() function.

4.1  Execute SQL statement

To execute SQL statements, there are two methods – either we can use the function sqlQuery() which behaves similarly to a standard SQLExecute() method, or we can use two different function calls odbcQuery() and sqlGetResults().

First, let’s see how we can use sqlQuery() function and see what it returns. By calling the sqlQuery() function, the query gets executed and the results are displayed on the R terminal

> sqlQuery(cd, paste('select * from "Artists" order by artist_id'))
artist_id artist_name
1 1 Kurt Cobain
2 2 Axl Rose
3 3 John Mayer

The second way is to separately make two function calls. The first call is made to odbcQuery() which will execute the query and return a result code. The call returns 1 for successes and -1 for errors. In case there is an error, you can call odbcGetErrMsg() function to retrieve the error.

# Execute a statement which will throw back an error
> odbcQuery(ch, paste('select * from "Art"'))
[1] -1

# Retrieve the error message
> odbcGetErrMsg(ch)
[1] "42S02 -3807 [Teradata][ODBC Teradata Driver][Teradata Database] Object 'Art' does not exist. " "[RODBC] ERROR: Could not SQLExecDirect 'select * from \"Art\"'"

4.2  Retrieve Data

Once the odbcQuery() functions returns success, you can retrieve the results using the sqlGetResults() function.

# Execute a select statement
> odbcQuery(ch, paste('select * from "Artists" order by "Artist_id"'))
[1] 1

# Retrieve the results
> sqlGetResults(ch)
artist_id artist_name
1 1 Kurt Cobain
2 2 Axl Rose
3 3 John Mayer

4.3  Disconnect

Finally, you can close the channel by using the odbcClose() or odbcCloseAll() functions

> odbcClose(ch)
> odbcCloseAll()

4.4  Complete Example

# Loading the RODBC Package
> library(RODBC)

# Connecting to the database using a DSN-Less Connection
> ch <- odbcDriverConnect("Driver=Teradata;DBCName=127.0.0.1;UID=testUser;PWD=ter@d@t@")

# Execute the statement
> odbcQuery(ch, paste('Select * from "Artists" order by "Artist_ID"'))
[1] 1

# Retrieve results from the channel
> sqlGetResults(ch)
Artist_ID Artist_Name
1 1 John Mayer
2 2 Slash
3 3 Kurt Cobain
4 4 Axl Rose

# Execute the statement and retrieve the results with one function call
> sqlQuery(ch, paste('Select * from "Artists" order by "Artist_ID"'))

Artist_ID Artist_Name
1 1 John Mayer
2 2 Slash
3 3 Kurt Cobain
4 4 Axl Rose

# Close the channel
> odbcClose(ch)