Blog

The best minds from Teradata, our partners, and customers blog about whatever takes their fancy.

turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

05-31-2011
10:28 AM

- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Email to a Friend
- Printer Friendly Page
- Report Inappropriate Content

05-31-2011
10:28 AM

The most difficult task when converting Flat SQL (cursor-based procedural code) to Set SQL involves translating complex logic and accounting for multiple data sources. The Derived Table feature of Set SQL simplifies this process.

The Basic Approach to converting Flat SQL to Set SQL was described in the last post. Briefly, the approach is:

- Locate each SQL insert, update and delete statement.
- For each statement, identify the sources of the values used. If the sources are multiple tables, then construct joins in the Set SQL FROM clause. Implement conditional logic as SQL CASE operations.
- Identify the conditions under which the update statement is to be executed and express this in the Set SQL predicate.

The second step is where derived tables come in handy, because they help to visualize all the data sources and conditional logic in Set SQL expressions.

Derived tables are also sometimes known as "in-line views." A derived table is like a temporary table or view: it is a temporary result set that is treated as a table in an SQL statement. Before there were derived tables in SQL, dealing with complex logic in Set SQL often required creating a temporary table, populating it for use in a subsequent SQL statement, then dropping it. Derived tables eliminate that effort. Moreover, Set SQL that uses derived tables can be easier to understand: a SQL script with derived tables uses fewer words and is more readable than a script that contains create and insert statements that populate temporary tables.

Sometimes the construction of derived tables is just an intermediate step toward the final form of production-ready Set SQL. Derived table syntax makes it easier to convert the logic into Set SQL expressions right away, but then the Set SQL can often be simplified further, and some derived tables can be eliminated.

A derived table is listed in a SQL From clause like any other table. It is defined by a select statement enclosed in parentheses, followed by a table alias name and, optionally, column alias names. For example, here D1 is a derived table with columns named C1, C2 and C3:

Select MIN(D1.C1)

From

( SELECT SUM(X1), X2, X3 From TableX group by 2,3 ) D1 ( C1, C2, C3 )

Here is an example of some Flat SQL that contains a nested loop and multiple data sources:

Begin

Declare LookupCode CHAR(3);

For C1 as C1 cursor for

select A.A1, A.A2, B.B3 from Table1 A, Table2 B where A.A1 = B.B1

Do -- OUTER LOOP ============================================

Begin

Set LookupCode = C1.A2;

For C2 as C2 cursor for

select R.R2 from Table3 R where R.R1 = :LookupCode

Do -- INNER LOOP ----------------------------

Begin

If C2.R2 is not null Then

Insert into Table4

Select C1.A1 ,C1.A2 ,C1.B3 ,C2.R2;

End If;

End;

End For; -------------------------------------

End;

End For; --===================================================

End;

Following the basic approach referenced above, the first thing to do is look for an insert, update or delete statement. There is only one of those here, in the inner loop:

Insert into Table4

Select C1.A1 ,C1.A2 ,C1.B3 ,C2.R2;

The Set SQL statement begins with the same Insert command:

insert into Table4

select C1.A1, C1.A2, C1.B3, C2.R2

The next step is to trace the sources of data values back through the code and construct the From clause of the select statement. When data values come from more than one cursor, as they do in this case, the easiest way to build the From clause is to express the cursors as derived tables, that is, simply copy and paste those cursor expressions as derived tables into the From clause. To maintain consistency and to make it easier to compare the Set SQL to the Flat SQL, use the cursor names as aliases for the derived tables.

FROM

(

select A.A1, A.A2, B.B3 from Table1 A, Table2 B where A.A1 = B.B1

) C1 ( A1, A2, B3 )

,(

select R.R1, R.R2 from Table3 R where R.R1 = :LookupCode

) C2 ( R1, R2 )

Now there are two derived tables in the Set SQL. Their alias names are C1 and C2, corresponding to the cursors C1 and C2 in the Flat SQL. Notice that each select statement from the cursors is enclosed in parentheses and is followed by the derived table alias name. Following the alias name is a parenthetical list of column aliases. The column aliases do not have to be defined in this manner; they can be defined within the select statement, such as:

,(

select R.R1 as R1, R.R2 as R2 from Table3 R where R.R1 = :LookupCode

) C2

In fact, column aliases are optional if the selected column names can be used. Therefore, the alias names used above are redundant, but it is a good habit to use column aliases for the sake of clarity and readability.

There is more than one derived table in the Set SQL From clause, so they need to be joined. Look for the key variables in the flat logic that indicate on which columns the tables will be joined. In effect, the nested loop in the Flat SQL is doing its own join of the two cursors, C1 and C2. There is one expression in the second cursor, C2, that tells how the sources can be joined: "where R.R1 = :LookupCode." LookupCode is the value that connects the two cursors.

LookupCode is a host variable in the Flat SQL; it will not be meaningful in the Set SQL. The inner loop in the Flat SQL is executed for each different value of LookupCode. The outer and inner cursor loops are "joined" by the LookupCode, so this is the key to the join predicate that the Set SQL will need. Where does the value for LookupCode come from? In the Flat SQL, LookupCode is set from the value A2 in the outer cursor, C1. LookupCode - that is, C1.A2 - is the variable that defines the way these two cursors (or derived tables) are joined. Therefore, this Where clause needs to be moved outside the derived table definition to the predicate that defines the join between C1 and C2.

The expression "R.R1" from the cursor C2 translates to "C2.R1" in the join predicate, and ":LookupCode" becomes "C1.A2." This is the Set SQL statement after adding the join predicate:

insert into Table4

select C1.A1, C1.A2, C1.B3, C2.R2

FROM

(

select A.A1, A.A2, B.B3 from Table1 A, Table2 B where A.A1 = B.B1

) C1 ( A1, A2, B3 )

,(

select R.R1, R.R2 from Table3 R

) C2 ( R1, R2 )

where C2.R1 = C1.A2

The last thing to do is look for the conditions under which this insert is executed. Immediately preceding the insert is an If-statement: "If C2.R2 is not null Then ...." This condition needs to be reflected in the predicate.

insert into Table4

select C1.A1, C1.A2, C1.B3, C2.R2

FROM

(

select A.A1, A.A2, B.B3 from Table1 A, Table2 B where A.A1 = B.B1

) C1 ( A1, A2, B3 )

,(

select R.R1, R.R2 from Table3 R

) C2 ( R1, R2 )

where C2.R1 = C1.A2

and C2.R2 is not null

Is there anything else in the Flat SQL that is needed but has not been addressed in the Set SQL? No, every bit of logic has been incorporated here. It is important to double-check this, because incomplete predicates usually lead to product joins or other incorrect results. Run an Explain on Set SQL before trying to execute it: the Explain will not only check syntax, it will identify several warning signs, including product joins.

This Set SQL will execute and will match the results of the Flat SQL, but it can be improved by eliminating the derived tables. Anything that simplifies Set SQL will improve its readability and usually improve its performance too. Derived tables in particular are materialized as temporary data sets in spool. If they can be removed from the SQL statement, then the writes and reads to spool can be avoided. In fact, the Teradata Optimizer sometimes dynamically rewrites SQL to eliminate derived tables.

It may seem like extra work to create derived tables only to erase them, but they are very easy to create (just copy from the cursors and paste!). Furthermore, once the derived tables are in place, reduction to simpler SQL becomes clearer; it's analogous to reducing fractions in arithmetic. Don't hesitate to use derived tables to rapidly develop a Set SQL statement, and don't forget to review the resulting SQL to see how it might be simplified

The derived table C2 is unnecessary because it is just a simple select from Table3. The join between C1 and C2 can be expressed as a simpler join between Table1 and Table3. The derived table C1 is a join of Table1 and Table2, but that join does not need to occur within a derived table either. Therefore, the Select and Where clauses can be changed to directly reference the three tables involved:

insert into Table4

select A.A1, A.A2, B.B3, R.R2

FROM

Table1 A, Table2 B, Table3 R

where A.A1 = B.B1

and R.R1 = A.A2

and R.R2 is not null

This SQL is much simpler and easier to read. The programmer who has to read it three years from now will be able to quickly grasp the intent of this statement.

Now compare the Flat SQL with the final form of the Set SQL. Can you convince yourself that they are logically the same thing?

Derived tables cannot always be eliminated from Set SQL. Logic that requires more than one step in the process is often best resolved through the use of derived tables. In future blog entries, I will explore how to use derived tables to solve problems that involve recursive matching, counting, interpolation and other convoluted logic.

5 Comments

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.