From Flat Logic to Set Logic - a basic approach

Blog
The best minds from Teradata, our partners, and customers blog about whatever takes their fancy.
Teradata Employee

Perhaps the easiest way to transition from a flat logic mindset to a set logic mindset is to convert some Flat SQL to Set SQL, then stare at both of them until the logical equivalence of the two approaches becomes clear.

The first step in converting is to look through the code for a loop containing one of these SQL commands: insert, update or delete.  One of these commands will begin the Set SQL statement.

The next step is to identify the sources of the values used in the SQL command.  There are three sources of these values: tables, constants and external files.  There are often intermediate computations involved, but tables are the ultimate source of most values.  Constant values can be run-time parameters or simple functions like CURRENT_DATE.  When data comes from an external file, the file must be loaded as a table before the Set SQL can use it.

The Set SQL statement can reference constants directly as literals or host variables.  Source tables are identified in the FROM clause of a Set SQL statement.  Some intermediate computations can be done in the Set SQL column list by the use of functions, expressions and CASE operations.  Other levels of computation in Set SQL require the definition of derived tables in the FROM clause.

Finally, the SQL command is often executed only under certain conditions, which are expressed in Flat SQL as IF-statements or CASE-statements.  These conditions belong in the predicate or WHERE clause of the Set SQL statement.

Since the SQL command, which belongs at the beginning of a Set SQL statement, is usually found in the middle of Flat SQL code, and the predicate, which belongs at the end of a Set SQL statement, is usually found near the beginning of a block of Flat SQL code, converting from Flat SQL to Set SQL feels like turning the logic inside-out.

Consider this procedure:

REPLACE PROCEDURE PCurs1 (
OUT Updates INTEGER
,OUT Inserts INTEGER
,OUT Insert2 INTEGER
)
BEGIN
Declare myKey1 INT;
Declare myEffDate date;
Declare myAmt dec(38,16);
Declare myNr INT;
Declare myChar1 CHAR(100);
Declare myVarchar1 VARCHAR(200);
Declare QuitNow INT Default 0;

Declare CONTINUE HANDLER FOR NOT FOUND
Set QuitNow = 1;

Set Updates = 0;
Set Inserts = 0;
Set Insert2 = 0;

MainLoop:
FOR Table1_cursor as Table1_cursor cursor for
SELECT Key1, EffDate, Amt, Nr, Char1, Varchar1
FROM Table1
ORDER BY Key1
DO
If QuitNow = 1 then
LEAVE MainLoop;
End If;
Set myKey1 = Table1_cursor.Key1;
Set myEffDate = Table1_cursor.EffDate;
Set myAmt = Table1_cursor.Amt;
Set myNr = Table1_cursor.Nr;
Set myChar1 = Table1_cursor.Char1;
Set myVarchar1 = Table1_cursor.Varchar1;

CASE myKey1 mod 3
WHEN 1 THEN
insert into Table2 values(:myKey1, :myEffDate, :myAmt, :myNr, :myChar1, :myVarchar1);
Set Inserts = Inserts + 1;
WHEN 2 THEN
if myKey1 mod 2 = 0 then
set myChar1 = 'This Even Row has been updated';
else
set myChar1 = 'This Odd Row has been updated';
End If;
update Table1
set Char1 = :myChar1
where Key1 = :myKey1
and EffDate = :myEffdate;
Set Updates = Updates + 1;
ELSE
insert into Table3 values(:myKey1, cast(:myEffDate as timestamp(0)), :myAmt, :myChar1);
Set Insert2 = Insert2 + 1;
END CASE;
END FOR;
END;

This procedure contains one loop.  The first SQL command of interest is an Insert into Table2, so the first Set SQL command will be "Insert into Table2 ...."  What will it insert?  It will insert :myKey1, :myEffDate, :myAmt, :myNr, :myChar1 and :myVarchar1.

insert into Table2 values(:myKey1, :myEffDate, :myAmt, :myNr, :myChar1, :myVarchar1);

Where do these come from? Tracing these variables back through the code shows they are set by assignment statements to values that come from Table1_cursor.  Table1_cursor is a select from Table1.

Since the values inserted come from Table1, the Set SQL Insert statement must include a Select from Table1, just like the Select in the cursor.  So far, it looks like this:

insert into Table2
select Key1, EffDate, Amt, Nr, Char1, Varchar1
FROM Table1;

There is another question to ask, though: under what conditions is this Insert executed?  Directly preceding the insert statement in the procedure is a "Case When" myKey1 mod 3 = 1.

    CASE myKey1 mod 3
WHEN 1 THEN
insert into Table2 values(:myKey1, :myEffDate, :myAmt, :myNr, :myChar1, :myVarchar1);

The Set SQL statement must include a predicate that specifies this condition.   Conveniently, myKey1 comes from Table1.  If it came from another table, Table1 would have to be joined to that table in the Set SQL FROM clause.

insert into Table2
select Key1, EffDate, Amt, Nr, Char1, Varchar1
FROM Table1
where Key1 mod 3 = 1;

There is one other interesting step in the Flat SQL following the insert: a count of the number of inserts is incremented.  This number is one of the values returned by the procedure.  How can this counter be implemented in Set SQL?  A special Teradata variable called ACTIVITY_COUNT is set after each SQL statement to indicate the number of rows affected by the operation.  (For embedded SQL applications, this variable is in the SQLERRD array of the SQLCA.)  The Set SQL procedure will contain a line that sets the counter to ACTIVITY_COUNT.

insert into Table2
select Key1, EffDate, Amt, Nr, Char1, Varchar1
FROM Table1
where Key1 mod 3 = 1;
Set Inserts = ACTIVITY_COUNT;

That takes care of the first SQL command in this loop.

The next one is a little more complex.  The Update in the Flat SQL procedure uses a value that comes from the :myChar1 variable, but there is a little bit of extra logic involving the :myKey1 variable to compute this value:

     WHEN 2 THEN
if myKey1 mod 2 = 0 then
set myChar1 = 'This Even Row has been updated';
else
set myChar1 = 'This Odd Row has been updated';
End If;
update Table1
set Char1 = :myChar1
where Key1 = :myKey1
and EffDate = :myEffdate;

This can be implemented in a Set SQL Update statement as a CASE operation.

update Table1
set Char1 =
(Case
when Key1 mod 2 = 0 then 'This Even Row has been updated'
else 'This Odd Row has been updated'
End)
where Key1 mod 3 = 2;

Note the predicate here, "where Key1 mod 3 = 2" reflects the "WHEN 2" condition in the Flat SQL procedure.

What about the where clause in the Flat SQL update?  We don't need this in the Set SQL update.  Remember, the Flat SQL is updating one row per update statement - the row just retrieved via the cursor.  The only purpose of the where clause in this Flat SQL update statement is to identify that row.   If the where clause referenced a table other than the one being updated, or if it identified more than one row, that condition would have to be reflected in the Set SQL predicate.

The values come from constants chosen using data from Table1, the table being updated, so no FROM clause is required in this Set SQL Update statement.  If the flat logic used values from a table or tables other than Table1, the Set SQL would need to join to that table(s).  (A join is implemented in an Update statement as "Update ... From <table-name>.")

The final SQL statement in this loop is an insert similar to the first.  It inserts slightly different values into another table.

    ELSE
insert into Table3 values(:myKey1, cast(:myEffDate as timestamp(0)), :myAmt, :myChar1);

The translation to Set SQL now seems fairly straightforward.  Notice that the cast() function operates directly on the table column.

insert into Table3
select Key1, cast(EffDate as timestamp(0)), Amt, Char1
from Table1
where Key1 mod 3 = 0;

The predicate here is based on the ELSE condition of the flat logic's CASE statement.  In a literal translation of an ELSE, the predicate is the negation of the first two WHEN conditions, such as, "where NOT (Key1 mod 3 = 1 or Key1 mod 3 = 2)."   Sometimes the predicate has to take that form, but in this case "where Key1 mod 3 = 0" means the same thing and is less irritating.

Putting it all together produces a procedure that looks like this:

REPLACE PROCEDURE PSet (
OUT Updates INTEGER
,OUT Inserts INTEGER
,OUT Insert2 INTEGER
)
BEGIN
Set Inserts = 0;
Set Updates = 0;
Set Insert2 = 0;

insert into Table2
select Key1, EffDate, Amt, Nr, Char1, Varchar1
FROM Table1
where Key1 mod 3 = 1;
Set Inserts = ACTIVITY_COUNT;

update Table1
set Char1 =
(Case
when Key1 mod 2 = 0 then 'This Even Row has been updated'
else 'This Odd Row has been updated'
End)
where Key1 mod 3 = 2;
Set Updates = ACTIVITY_COUNT;

insert into Table3
select Key1, cast(EffDate as timestamp(0)), Amt, Char1
from Table1
where Key1 mod 3 = 0;
Set Insert2 = ACTIVITY_COUNT;

END;

A performance-conscious programmer will notice that the Set SQL version of the procedure reads Table1 three times.  No doubt the Flat SQL procedure was designed to avoid that - the flat logic loops through Table1 just once.  However, in Teradata the Set SQL-based procedure is many times faster than Flat SQL even with three table scans.

A colleague and I designed and ran some tests on procedures almost identical to these.  On a 40-AMP system, the Set SQL version ran 47 times faster (98% less elapsed time), required 79% fewer I/O's and used 86% less CPU time.

What if the Flat SQL seems too large and too complex to convert?  Can a procedure based on flat logic be modified so that multiple instances can run concurrently, each operating on just a subset of the data?  We tried that, experimenting with different ways of dividing up the workload, such as having each Flat SQL procedure process just one percent of the data, or having each one process one AMP's worth of data.  These approaches all ran faster, but they were still about ten times slower than the Set SQL procedure.

To review, these are the steps required to convert Flat SQL to Set SQL:

  • Locate each SQL insert, update and delete statement.
  • For each statement, identify the sources of the values used.  If the sources are multiple tables, then construct joins in the Set SQL FROM clause.  Implement conditional logic as SQL CASE operations.  Derived tables may be needed to implement some logic.
  • Identify the conditions under which the update statement is to be executed and express this in the Set SQL predicate.

You can practice this approach on procedures and programs that have a similar structure.   When you are comfortable with this, think about how you would handle more intricate flat logic such as nested loops and multiple source tables.   Future blog entries will offer guidelines for handling these kinds of complexities.  Sometimes the only way to devise a solution is to go into what might be called a deep logic trance.  It is difficult to offer advice on trances, but maybe a few more examples will provide some inspiration.   Stay tuned!

2 Comments
Thanks a lot for posting the code. I've had several errors when I tried that the last time. I will try this out soon, am sure this is working fine right?
Teradata Employee
Yes, this code worked pretty much as shown here a couple of years ago. (There could be some typos in this text.) Of course, you would have to create tables that have the names and columns used here. Thanks.