R-in-Aster: Loading Helper Files into R During Stream Process

Learn Data Science
Teradata Employee

The primary path for loading data into R while it is running in-database within Aster is through the STDIN in a data frame format.  The STDIN is definitely the path for processing large volumes of data at high speeds.  On the other hand, there will be many instances where helper files are needed.  A randomForest type algorithm, for instance, will use a model file, a text analysis algorithm may use a dictionary file, etc.

The process for importing these types of secondary files into the R-in-Aster execution process is by packing the necessary R object into a file format that can be installed into the Aster database along with the R function script.  R objects of any format (list, matrix, vector, data frame, etc) can be "serialized" for installation and then later "unserialized" within the stream processing of the R script.  Serializing is also called pickling, marshalling, and flattening.  (In Python, the pickle module is used to carry out this process.)

The most commonly used R functions for this process are the serialize() and unserialize() functions which by default save and read a binary format.

The size of a single helper file should not exceed 32MB.  Multiple helper files could be used.

A separate blog and video will explore the process of passing parameters, such as the names of helper files, into the R script.

R Script Example of Using Serialize Process

#####################################################################

## Exporting from regular R and Importing into R-in-Aster (not stdin/stdout)

#####################################################################

### Export an R object to a file that can be installed into Aster database as a supporting file

### This is performed in a process prior to and separate from running the R stream script within Aster

x= c(1,2)  # create dummy data for example

outFile=file('C:/text.txt','w')

serialize(x,outFile)

close(outFile)

### Import serialized file that has been installed into Aster database using serialize() function

inCon<-file('C:/text.txt','r')

recoveredObject<-unserialize(inCon)

close(inCon)