Using Derived Tables as a Shortcut to Set SQL

Blog
The best minds from Teradata, our partners, and customers blog about whatever takes their fancy.
Teradata Employee

The most difficult task when converting Flat SQL (cursor-based procedural code) to Set SQL involves translating complex logic and accounting for multiple data sources.  The Derived Table feature of Set SQL simplifies this process.

The Basic Approach to converting Flat SQL to Set SQL was described in the last post.  Briefly, the approach is:

  • Locate each SQL insert, update and delete statement.
  • For each statement, identify the sources of the values used.  If the sources are multiple tables, then construct joins in the Set SQL FROM clause.  Implement conditional logic as SQL CASE operations.
  • Identify the conditions under which the update statement is to be executed and express this in the Set SQL predicate.

The second step is where derived tables come in handy, because they help to visualize all the data sources and conditional logic in Set SQL expressions.

Derived tables are also sometimes known as "in-line views."  A derived table is like a temporary table or view: it is a temporary result set that is treated as a table in an SQL statement.  Before there were derived tables in SQL, dealing with complex logic in Set SQL often required creating a temporary table, populating it for use in a subsequent SQL statement, then dropping it.  Derived tables eliminate that effort.  Moreover, Set SQL that uses derived tables can be easier to understand: a SQL script with derived tables uses fewer words and is more readable than a script that contains create and insert statements that populate temporary tables.

Sometimes the construction of derived tables is just an intermediate step toward the final form of production-ready Set SQL.  Derived table syntax makes it easier to convert the logic into Set SQL expressions right away, but then the Set SQL can often be simplified further, and some derived tables can be eliminated.

A derived table is listed in a SQL From clause like any other table.  It is defined by a select statement enclosed in parentheses, followed by a table alias name and, optionally, column alias names.  For example, here D1 is a derived table with columns named C1, C2 and C3:

Select MIN(D1.C1)
From
( SELECT SUM(X1), X2, X3 From TableX group by 2,3 ) D1 ( C1, C2, C3 )

Here is an example of some Flat SQL that contains a nested loop and multiple data sources:

Begin
Declare LookupCode CHAR(3);
    For C1 as C1 cursor for
        select A.A1, A.A2, B.B3 from Table1 A, Table2 B where A.A1 = B.B1
    Do  -- OUTER LOOP ============================================
        Begin
            Set LookupCode = C1.A2;
            For C2 as C2 cursor for
                select R.R2 from Table3 R where R.R1 = :LookupCode
            Do  -- INNER LOOP ----------------------------
                Begin
                    If C2.R2 is not null Then
                        Insert into Table4
                        Select C1.A1 ,C1.A2 ,C1.B3 ,C2.R2;
                    End If;
                End;
            End For; -------------------------------------
        End;
    End For; --===================================================
End;

Following the basic approach referenced above, the first thing to do is look for an insert, update or delete statement. There is only one of those here, in the inner loop:

                        Insert into Table4
Select C1.A1 ,C1.A2 ,C1.B3 ,C2.R2;

The Set SQL statement begins with the same Insert command:

insert into Table4
select C1.A1, C1.A2, C1.B3, C2.R2

The next step is to trace the sources of data values back through the code and construct the From clause of the select statement.  When data values come from more than one cursor, as they do in this case, the easiest way to build the From clause is to express the cursors as derived tables, that is, simply copy and paste those cursor expressions as derived tables into the From clause.  To maintain consistency and to make it easier to compare the Set SQL to the Flat SQL, use the cursor names as aliases for the derived tables.

FROM
(
select A.A1, A.A2, B.B3 from Table1 A, Table2 B where A.A1 = B.B1
) C1 ( A1, A2, B3 )

,(
select R.R1, R.R2 from Table3 R where R.R1 = :LookupCode
) C2 ( R1, R2 )

Now there are two derived tables in the Set SQL.  Their alias names are C1 and C2, corresponding to the cursors C1 and C2 in the Flat SQL.  Notice that each select statement from the cursors is enclosed in parentheses and is followed by the derived table alias name.  Following the alias name is a parenthetical list of column aliases.  The column aliases do not have to be defined in this manner; they can be defined within the select statement, such as:

,(
select R.R1 as R1, R.R2 as R2 from Table3 R where R.R1 = :LookupCode
) C2

In fact, column aliases are optional if the selected column names can be used.  Therefore, the alias names used above are redundant, but it is a good habit to use column aliases for the sake of clarity and readability.

There is more than one derived table in the Set SQL From clause, so they need to be joined.  Look for the key variables in the flat logic that indicate on which columns the tables will be joined.  In effect, the nested loop in the Flat SQL is doing its own join of the two cursors, C1 and C2.  There is one expression in the second cursor, C2, that tells how the sources can be joined: "where R.R1 = :LookupCode."  LookupCode is the value that connects the two cursors.

LookupCode is a host variable in the Flat SQL; it will not be meaningful in the Set SQL.  The inner loop in the Flat SQL is executed for each different value of LookupCode.  The outer and inner cursor loops are "joined" by the LookupCode, so this is the key to the join predicate that the Set SQL will need.  Where does the value for LookupCode come from?  In the Flat SQL, LookupCode is set from the value A2 in the outer cursor, C1.  LookupCode - that is, C1.A2 - is the variable that defines the way these two cursors (or derived tables) are joined.  Therefore, this Where clause needs to be moved outside the derived table definition to the predicate that defines the join between C1 and C2.

The expression "R.R1" from the cursor C2 translates to "C2.R1" in the join predicate, and ":LookupCode" becomes "C1.A2."  This is the Set SQL statement after adding the join predicate:

insert into Table4
select C1.A1, C1.A2, C1.B3, C2.R2
FROM
(
select A.A1, A.A2, B.B3 from Table1 A, Table2 B where A.A1 = B.B1
) C1 ( A1, A2, B3 )
,(
select R.R1, R.R2 from Table3 R
) C2 ( R1, R2 )
where C2.R1 = C1.A2

The last thing to do is look for the conditions under which this insert is executed.  Immediately preceding the insert is an If-statement: "If C2.R2 is not null Then ...."  This condition needs to be reflected in the predicate.

insert into Table4
select C1.A1, C1.A2, C1.B3, C2.R2
FROM
(
select A.A1, A.A2, B.B3 from Table1 A, Table2 B where A.A1 = B.B1
) C1 ( A1, A2, B3 )
,(
select R.R1, R.R2 from Table3 R
) C2 ( R1, R2 )
where C2.R1 = C1.A2
and C2.R2 is not null

Is there anything else in the Flat SQL that is needed but has not been addressed in the Set SQL?  No, every bit of logic has been incorporated here.  It is important to double-check this, because incomplete predicates usually lead to product joins or other incorrect results.  Run an Explain on Set SQL before trying to execute it: the Explain will not only check syntax, it will identify several warning signs, including product joins.

This Set SQL will execute and will match the results of the Flat SQL, but it can be improved by eliminating the derived tables.  Anything that simplifies Set SQL will improve its readability and usually improve its performance too.  Derived tables in particular are materialized as temporary data sets in spool.  If they can be removed from the SQL statement, then the writes and reads to spool can be avoided.  In fact, the Teradata Optimizer sometimes dynamically rewrites SQL to eliminate derived tables.

It may seem like extra work to create derived tables only to erase them, but they are very easy to create (just copy from the cursors and paste!).  Furthermore, once the derived tables are in place, reduction to simpler SQL becomes clearer; it's analogous to reducing fractions in arithmetic.  Don't hesitate to use derived tables to rapidly develop a Set SQL statement, and don't forget to review the resulting SQL to see how it might be simplified

The derived table C2 is unnecessary because it is just a simple select from Table3.  The join between C1 and C2 can be expressed as a simpler join between Table1 and Table3.  The derived table C1 is a join of Table1 and Table2, but that join does not need to occur within a derived table either.  Therefore, the Select and Where clauses can be changed to directly reference the three tables involved:

insert into Table4
select A.A1, A.A2, B.B3, R.R2
FROM
Table1 A, Table2 B, Table3 R
where A.A1 = B.B1
and R.R1 = A.A2
and R.R2 is not null

This SQL is much simpler and easier to read.  The programmer who has to read it three years from now will be able to quickly grasp the intent of this statement.

Now compare the Flat SQL with the final form of the Set SQL.  Can you convince yourself that they are logically the same thing?

Derived tables cannot always be eliminated from Set SQL.  Logic that requires more than one step in the process is often best resolved through the use of derived tables.  In future blog entries, I will explore how to use derived tables to solve problems that involve recursive matching, counting, interpolation and other convoluted logic.

5 Comments
N/A
Hi George..

Grt bogs..Have read all ur postings..

I have a scenario like two cursors have been used to insert/update tables.

Using dynamic SQL and Cursor, creating snapshots for 13 tables.

Can you please help me to write this Flat SQL to SET SQL.
Copying the Code snippet:

for table_loop as table_cursor cursor for
select rtrim(table_nm) table_nm
from #temp_info where snapshot = 'Y'
do

select table_id into @table_id
from DBC.tables
where table_name = table_nm and
table_type = 'BASE' and
creator = user_id('dbo');

if @debug > 1 then
message 'table ',table_nm,' id = ',@table_id type info to client;
end if;

for column_loop as column_cursor cursor for
select column_name
from DBC.column where table_id = @table_id
order by column_id
do

if @debug > 1 then
message ' ',column_name type info to client;
end if;

if column_name = 'feed_key' then
set @column_c_nm = 'feed_key,as_of_dt,rpt_nm';
set @column_s_nm = 'feed_key,'''||@as_of_dt||''','''||@report_nm||'''';
else
set @column_c_nm = column_name;
set @column_s_nm = column_name;
end if;

if @column_list is null then
set @column_list = @column_c_nm;
set @select_list = @column_s_nm;
else
set @column_list = @column_list || ',' || @column_c_nm;
set @select_list = @select_list || ',' || @column_s_nm;
end if;

set @column_c_nm = null;
set @column_s_nm = null;

end for;

set @sql = 'delete '||table_nm||'_snap where as_of_dt = '''||@as_of_dt||
''' and (rpt_nm is null or rpt_nm = '''||@report_nm||''')';

if @debug > 1 then
message 'sql = ',@sql type info to client;
end if;

execute immediate @sql;
if @@error != 0 then set @error_flag = 1 end if;

-- set @sql = 'insert into '||table_nm||'_snap ('||@column_list||') select '||
-- @select_list||' from '||table_nm|| ' where end_dt > now(*)';
set @sql = 'insert into '||table_nm||'_snap ('||@column_list||') select '||
@select_list||' from '||table_nm||
' where '''||@as_of_dt||''' between start_dt and end_dt';

if @debug > 1 then
message 'sql = ',@sql type info to client;
end if;

execute immediate @sql;
if @@error != 0 then set @error_flag = 1 end if;

set @column_list = null;
set @select_list = null;

end for;
Teradata Employee
Thank you, Pranu,
This is a good example of a cursor loop that does not need to be eliminated for two reasons: first, the cursor is looking for certain list of table names in the data dictionary. I would guess there are not very many such tables (less than 100?), so a loop over 100 rows is not so bad. Second, each loop iteration constructs dynamic Set SQL statements and executes them, so even though there is a cursor loop involved, this procedure is already doing Set SQL.

In general, when you need to construct a series of dynamic SQL statements, there is no way to avoid a cursor loop. It is often possible to accomplish the same thing with a BTEQ script that generates the SQL statements to a file and then .RUN's the file, but this is not necessarily more efficient.

Cursor loops are not inherently bad. Single-row inserts updates and deletes from within cursor loops are bad! You are not doing that here.

On the other hand, this is a nested loop, and nested loops can almost always be replaced by joins. In this case, there is no need for a nested loop because all the information needed to construct the dynamic SQL statement can be obtained by joining DBC.Tables and DBC.Columns.

Basically, the single-loop logic could look like this:

Set table_nm = null;
for table_loop as table_cursor cursor for
select T.table_name, C.column_name
from DBC.Tables T, DBC.Columns C, #temp_info I
where rtrim(I.table_nm) = T.table_name
and T.tabl_id = C.table_id
and I.snapshot = 'Y'
and T.table_type = 'BASE'
and T.creator = 'dbo'
do
If table_nm is null then
set table_nm = T.Table_name; -- first time through
end if;
If T.Table_name <> table_nm then -- break on table name

[construct and execute dynamic delete and insert]

set @column_list = null;
set @select_list = null;
set table_nm = T.Table_name;

end if;

[construct column lists from C.column_name for next iteration]

end for;
N/A
Thanks a lot..
Teradata Employee
Hi Greg, This is really and interesting logic.

From above post i can understand that if cursor is aleady using set command , that's bit a tricky to avoid cursor.
Are there any other possible cursor logic that can't be avoid ?

Teradata Employee
Yes, there are things that seem just too difficult if not impossible to do in Set SQL. Usually these things have many, many lines of if-statements. It is possible to have so many IF's that the corresponding CASE operations and predicates would exceed the 1 MB limit on the size of an SQL statement. When I find another good example I'll share it with you.

One type of processing that is quite challenging is sequence-dependent logic. (In my last entry [developer.teradata.com/blog/GeorgeColeman/archive/2012/03] I showed an example where sequence-dependent logic could be implemented using an ordered analytical function.) Some such things are handled more efficiently by Aster Data, but even then one must still think in terms of Set Logic and Set SQL in order to use the full power of MPP.

One of my math professors once said, "Before we can prove something, we must have faith that it is true!" Have faith that it is possible to turn your flat logic into set logic, and you will almost always find that you can do it.