The primary path for loading data into R while it is running in-database within Aster is through the STDIN in a data frame format. The STDIN is definitely the path for processing large volumes of data at high speeds. On the other hand, there will be many instances where helper files are needed. A randomForest type algorithm, for instance, will use a model file, a text analysis algorithm may use a dictionary file, etc.
The process for importing these types of secondary files into the R-in-Aster execution process is by packing the necessary R object into a file format that can be installed into the Aster database along with the R function script. R objects of any format (list, matrix, vector, data frame, etc) can be "serialized" for installation and then later "unserialized" within the stream processing of the R script. Serializing is also called pickling, marshalling, and flattening. (In Python, the pickle module is used to carry out this process.)
The most commonly used R functions for this process are the serialize() and unserialize() functions which by default save and read a binary format.
The size of a single helper file should not exceed 32MB. Multiple helper files could be used.
A separate blog and video will explore the process of passing parameters, such as the names of helper files, into the R script.