In-database analytics with TeradataR

Applications
Applications covers the world of Teradata apps, including apps offered by Teradata (such as TRM or DCM), as well as best practices and methodologies for building your own Teradata-connected apps.
Teradata Employee

In-database analytics with TeradataR

Please note, we are no longer supporting teradataR since the decision was made for Teradata to focus on our partnership with Revolution for R integration with Teradata.

R  is an open source language for statistical computing and graphics. R provides a wide variety of statistical (linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, clustering) and graphical techniques, and is highly extensible. This free package is designed to allow users of R to interact with a Teradata database.  Users can use many statistical functions directly against the Teradata system without having to extract the data into memory.

You can download the latest (version 1.0.1) teradataR package here.

Update: The source for TeradataR has been approved for distribution to the public. The source as well as an updated TeradataR package (works with R 3.0) is available from https://github.com/Teradata/teradataR.

What is R?

R is an open source language for statistical computing and graphics. R provides a wide variety of statistical (linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, clustering) and graphical techniques, and is highly extensible. 

R is an integrated suite of software facilities for data manipulation, calculation and graphical display. It includes

  • an effective data handling and storage facility,
  • a suite of operators for calculations on arrays, in particular matrices,
  • a large, coherent, integrated collection of intermediate tools for data analysis,
  • graphical facilities for data analysis and display either on-screen or on hardcopy, and
  • a well-developed, simple and effective programming language which includes conditionals, loops, user-defined recursive functions and input and output facilities.

R is a flexible programmable language that allows users to add functionality by defining new functions. For computationally-intensive tasks, C, C++ and Fortran code can be linked and called at run time. Advanced users can write C code to manipulate R objects directly.

R can be extended (easily) via packages. There are about eight packages supplied with the R distribution and over 1,200 available through CRAN (Comprehensive R Archive Network) mirror sites.

R is available as Free Software under the terms of the Free Software Foundation's GNU General Public License in source code form. It compiles and runs on a wide variety of UNIX platforms and similar systems (including FreeBSD and Linux), Windows and MacOS.

The Teradata add-on package for R

teradataR is a package or library that allows R users to easily connect to Teradata, establish data frames (R data formats) to Teradata and to call in-database analytic functions within Teradata.  This allows R users to work within their R console environment while leveraging the in-database functions developed with Teradata Warehouse Miner. This package provides 44 different analytical functions and an additional 20 data connection and R infrastructure functions.  In addition, we’ve added a function that will list the stored procedures within Teradata provide the capability to call functions from R. 

  • 20 Functions to enable R infrastructure to operate with Teradata
  • tdConnect - Connect to Teradata via ODBC
  • Td.data.frame - Establish data frame connections to a Teradata table
  • 44 in-database analytical functions callable from R.  Sample of the functions include:
  • Descriptive statistics: Overlap, histogram, frequency, statistics, matrix functions, and values analysis
  • Reorganization functions: join, merge and samples
  • Transformations: bincode, recode, rescale, sigmoid, zscore and null replacement
  • K-Means clustering and Score K-Means
  • Statistical tests: ks, dagostino.pearson, shapiro.wilk, bionomial, and wilcoxon
  • R language features nrow, ncol, min, max, summary, as.dataframe, and dim
  • Tool and R functions that allow users to create their own custom analytic functions that’s callable by R.
  • Teradata Warehouse Miner can capture any analytic stream including UDFs and create a stored procedure
    • Analytic process to create new derived predictive variables can be captured as a stored procedure.
    • Entire process to create or update an analytical data set can be captured as a stored procedure.
    • R function can list all the stored procedures within Teradata.
    • R function can call a stored procedure that runs in-database

TeradataR allows R users to leverage all the benefits of in-database processing with Teradata:

  • Eliminate data movement from Teradata to the R framework for key data intensive tasks.
  • Leverage the speed of Teradata database’s parallel processing to run analytics against big data.
  • Ability to operate within the R console environment.
  • Embed your frequently performed tasks to run in-database.
  • R and TeradataR are free downloads.

Getting Started with R and the teradataR package

Please refer to the teradataR 1.0.1 User Guide, included with the latest download, for more information about Getting Started with R and the teradataR Package.

Note that teradataR is a free package.  For community support, please visit the Analytics forum.

20 REPLIES
Fan

Re: In-database analytics with TeradataR

This is very exciting, however I don't see a lot of documentation surrounding the package. Am I missing a link to the reference manual? Thanks!

Re: In-database analytics with TeradataR

Though the html files for the docs are not in their usual place, ?commandName still works.
Enthusiast

Re: In-database analytics with TeradataR

I welcome the R interface to Teradata too! Whereas Teradata Miner is a helpful tool for standard tasks and SAS can do a few tricks with fastload/exp too, I expect R to provide an elegant environment for programming around Teradata stored data.
Teradata Employee

Re: In-database analytics with TeradataR

The teradataR package uses R documentation. To access the help files you must be within your R console and after loading the teradataR library you can then do help(teradataR). This will give you the help index and allow you to see the help for the whole package.
Teradata Employee

Re: In-database analytics with TeradataR

I am wondering if we can move statistic analysis into Teradata as build-in UDF function.

Re: In-database analytics with TeradataR

On a fresh install on new Linux server we had problems getting RODBC to connect to Teradata.

Resolved with these steps:

1) Download the source to RODBC and unpack

2) Build the package

R CMD build RODBC

3) Install the package

R CMD INSTALL ./RODBC_1.3-2.tar.gz --configure-args="--with-odbc-include=/opt/teradata/client/ODBC_64/include --with-odbc-lib=/opt/teradata/client/ODBC_64/lib/"

Teradata Employee

Re: In-database analytics with TeradataR

About as.td.data.frame (Coerce to a td data frame )
I would want to undersatnd about TD data_frame more,
-is makes some definition wihtin TD Database, or not?
-Or it makes some temporary table or View?

The help of as.td.data.frame shows only as below
Coerce to a td data frame
Description
Coerce a data frame into a td data frame

Usage
as.td.data.frame(x)

Arguments
x data frame.
-Best Regards
Fan

Re: In-database analytics with TeradataR

Does as.ta.data.frame() do a row by row transfer or does it invoke bulkload/fastload? For a small dataset (120MB) it takes really long. I am wondering if is doing a row by row insert. That was a major hassle with RODBC package.
Teradata Employee

Re: In-database analytics with TeradataR

The current teradataR package does not use bulkload/fastload abilities. It currently just uses the RODBC package to do the work so row by row. A future ability will be extracting and loading that is tied to Teradata via the utilities or an API.