In-Lining of LOBs in .NET Data Provider for Teradata 15.0

Blog
The best minds from Teradata, our partners, and customers blog about whatever takes their fancy.
Teradata Employee

This will be part 1 of a multi-part blog about how the .NET Data Provider for Teradata 15.0 can now In-Line Large Objects (LOB)  that are sent to a Teradata Database when executing an INSERT or UPDATE statement.  This first blog will introduce In-Lining of LOB.   Blogs will also be written that discuss how to take advantage of this feature, and performance characteristics.  All these blogs will be more technically oriented than my other blogs.

Overview

The first item that needs to be mentioned is that In-Lining of LOBs is only supported when the Data Provider is connected to a Teradata Database 14.0 or later.  If you install the .NET Data Provider for Teradata 15.0 and connect to a Teradata Database release earlier than 14.0 all LOBs will be sent using Deferred mode.

The data for all LOBS (BLOB, CLOB, XML, JSON, Geospatial) were sent to the Teradata Database using Deferred mode by the .NET Data Provider for Teradata in releases prior to 15.0.  It is now possible for a LOB to be In-Lined  by the 15.0 release of the Data Provider. 

This feature occurs automatically.  If the Data Provider determines that the data of a LOB can be In-Lined it will automatically In-Line the data.

What does it mean to In-Line a LOB or send a LOB as Deferred ?

Before answering this question a Message Buffer must be defined.  A Message Buffer is used by the Data Provider to write out all the data of a request that will be sent to the Teradata Database.  This includes the SQL statement, data of all the parameters, and associated overhead.  After the Message Buffer has been filled with all this information, it is sent to the Teradata Database for processing.

A LOB is In-Lined when all of its data is written to the Message Buffer.  Since the Message Buffer is limited to 1mb, the largest LOB that can be In-Lined will be approximately 1mb.

When a LOB is sent to the Teradata Database using Deferred mode a unique identifier is written in place of the LOB data in the Message Buffer.  This identifier takes up 4 bytes in the buffer.  When the Teradata Database receives the data in the buffer it sends a request to the Data Provider for the LOB data.  The Data Provider will fill the Message Buffer with the LOB data, then send the contents of the buffer to the Teradata Database.  The amount of data that can be sent to the Teradata Database is 1mb.  This back and forth communication between the Data Provider and Teradata Database continues until all the data of the LOB has been sent.

There is one less round of communication between the Data Provider and Teradata Database when a "small" LOB can be In-Lined.

When is a LOB In-Lined?

A LOB will not be In-Lined under the following conditions:

  • When the TdParameters.Size property is set to 0 and the data type that represents the LOB is a Stream, TextReader, or XmlReader.
  • When the total number of bytes of Non-Lob parameters and the total number of bytes of all  LOB parameters  exceed 1mb.  This is clarified in the discussion that follows.

There are two scenarios that need to be discussed when the Data Provider determines when the data of a LOB can be In-Lined:

  • Executing a command using one of the TdCommand's execution methods
  • Using TdDataAdapter to process a batch.

How In-Lining is Determined when TdCommand's Execution Methods are Called

When one of the TdCommand's execution methods (i.e. TdCommand.ExecuteNonQuery) is called, the Data Provider determines whether a LOB can be In-Lined by performing the following:

  • Subtracting the total number of bytes of the non-LOB parameters, the SQL statement,  and the number of bytes of the overhead from 1mb. 
  • Starting from the smallest LOB the Data Provider will check whether it can fit within the remaining space.  The LOB will be In-Lined if it can fit.  The size of the LOB is subtracted from the remaining space.  The size of the next smallest LOB is checked.  This will continue until a LOB does not fit in the space that remains.  Any LOB that cannot be In-Lined will be sent using Deferred mode.

Example using TdCommand.ExecuteNonQuery

In this example, a parameter row contains 3 BLOBS.

static void Example1(TdCommand cmd)
{
 
    FileStream blob1 = new FileStream("blob1.mp3", FileMode.Open);
    FileStream blob2 = new FileStream("blob2.mp3", FileMode.Open);
    FileStream blob3 = new FileStream("blob3.mp3", FileMode.Open);
 
    cmd.CommandText = "insert into exTable (int1, blob1, blob2, blob3) values (?, ?, ?, ?)";
 
    cmd.Parameters.Add(null, TdType.Integer);
    cmd.Parameters[0].Value = 1;
 
    // blob1 will be sent as Deferred because Size=0 and blob1 is a Stream
    cmd.Parameters.Add(null, TdType.Blob);
    cmd.Parameters[1].Size = 0;
    cmd.Parameters[1].Value = blob1;
 
    // blob2 will be sent In-Line because the Size will fit within the Message Buffer
    cmd.Parameters.Add(null, TdType.Blob);
    cmd.Parameters[2].Size = 1000;
    cmd.Parameters[2].Value = blob2;
 
    // blob3 will be sent as Deferred because the Size is too large to fit 
    // within the Message buffer
    cmd.Parameters.Add(null, TdType.Blob);
    cmd.Parameters[3].Size = 1000000000;      // 1,000,000,000
    cmd.Parameters[3].Value = blob3;
 
    cmd.ExecuteNonQuery();
}

In this example the BLOBs are represented by a FileStream. The Data Provider will perform the following actions on each of the BLOBs:

  • blob1 -- TdParameter.Size = 0 and the base type is a Stream. The Data Provider cannot determine the size of the BLOB. This BLOB will be sent deferred.
  • blob2 -- TdParameter.Size = 1000. When the overhead and the size of all the other parameters are accounted for, this LOB will fit within the Message Buffer. It will be In-Lined.
  • blob3 -- TdParameter.Size=1,000,000,000. The size of the BLOB is to large to fit within the Message Buffer.

How In-Lining is Determined when Processing a Batch

It gets a little more complicated when the Data Provider is processing a Batch.

  • Message Buffer Space allocated for each parameter row in the batch is calculated.
    • 1mb / TdDataAdapter.BatchSize
  • The Teradata Database requires that the same LOB parameter in each row to be In-Lined or Deferred. The calculation to determine which LOB can be In-Lined only has to be performed on one row.
  • The same calculation described in the TdCommand's execution method is used except the space allocation for a parameter row is used instead of 1mb.

Example of Using a Batch to Update a Table

static void Example2(TdConnection cn)
{
    DataTable dt = new DataTable("example2");

    dt.Columns.Add("int1", typeof(Int32));
    dt.Columns.Add("blob1", typeof(FileStream));
    dt.Columns.Add("blob2", typeof(FileStream));
    dt.Columns.Add("blob3", typeof(FileStream));
 
    // ***********
    // The dt is filled with rows 
    // ***********
 
    TdCommand cmd = cn.CreateCommand();
     cmd.Parameters.Add("int1", TdType.Integer, 0, "int1");
    cmd.Parameters[0].SourceVersion = DataRowVersion.Proposed;

    // blob1 will be sent In-Line because the Size will fit within the Message Buffer
    cmd.Parameters.Add("blob1", TdType.Blob, 40000, "blob1");
    cmd.Parameters[1].SourceVersion = DataRowVersion.Proposed;
 
    // blob2 will be sent as Deferred because Size=0 and blob2 is a Stream
    cmd.Parameters.Add("blob2", TdType.Blob, 0, "blob2");
    cmd.Parameters[2].SourceVersion = DataRowVersion.Proposed;
 
    // blob3 will be In-Lined because the space that remains in the Message
    // Buffer will fit the 30,000 bytes.
    cmd.Parameters.Add("blob3", TdType.Blob, 30000, "blob3");
    cmd.Parameters[3].SourceVersion = DataRowVersion.Proposed;
 
    cmd.CommandText = "insert into exTable (int1, blob1, blob2, blob3) values (?, ?, ?, ?)";
 
    TdDataAdapter da = new TdDataAdapter();
    da.InsertCommand = cmd;
 
    // iterated request (parameter arrays) will be used
    da.UpdateBatchSize = 10;
    da.ContinueUpdateOnError = true;
    da.KeepCommandBatchSequence = false;
 
    // Sending batch to Teradata Database.
    da.Update(dt);
 
    da.Dispose();
    cmd.Dispose();
}

In this example each of the BLOBs are represented as a FileStream. To determine which BLOBs will be In-Lined the Data Provider performs the following actions:

  • Calculates the space in the Message Buffer that will be assigned to each Parameter Row.
    •  1mb / TdDataAdapter.UpdateBatchSize  = 1mb / 10 = 100kb
  • Starting with the smallest LOB, the Data Provider will determine whether each LOB can fit within the 100kb.
    • blob2 will be sent Deferred because the TdParameter.Size=0 and the base type is a Stream.
    • The first BLOB that is considered for In-Lining is blob3 because it is the smallest LOB.  It has a Size of 30,000.  After subtracting the overhead from 100kb, blob3 can fit within the allocated space.  The Size of blob3 is subtracting from the remaining bytes.
    • The second BLOB that is considered for In-Lining is blob1.  It has a size of 40,000.  The Data Provider will check whether it can fit within the remaining space.  "blob1" will fit so it will also be In-Lined.

blob1 and blob3 are small BLOBs.  You may think that they can be represented as a Byte Array and the TdParameter.TdType for each parameter can be set to TdType.VarByte.  However, the Teradata Database has a limit on the number of bytes that can be set in a parameter row.  The limit is 64kb. 

This limit changes if the parameter row contains a LOB data type whose value can be In-Lined.  The limit increases to 1mb.