So I created a report that essentially runs a DML statement built off multiple volatile tables (4). I want to automate this report to pull in data daily, but I'm not sure whether to create a macro or sp for it. In terms of efficiency and simplicity, which will run faster in least amount of time to create: (A) a SP using dynamic cursors to hold resultsets from each "temp table" vs (B) a macro where I modify my multiple volatile tables into a complex SQL query with multiple sub queries?
I want to avoid creating global temp tables... not a huge fan of them for temporary query usage.
The easiest thing to try is to effectively put your existing code into a macro.
If you've got code that creates and populates your 4 volatile tables (VT), each of those become a derived table in a single query.
Providing that you don't need to use a VT (for instance to force distribution through specification of a PI) then performance should be the same as you've currently got.
If you need to use VT's then go the SP route, but don't use cursors.
Within your SP, use the 'DBC.SYSEXECSQL' capability to run your existing code.
Create VT's at run-time.
Run your final DML command to return the final result set(s).
Add in error handling to handle errors if creating tables that already exist etc.
I am inclined to say you will hit this error if you create multiple volatile tables within a macro.
Based on your use case, either you use GTT or try stored procedure.
3576 Data definition not valid unless solitary.
Sorry I may not have explained myself clearly enough.
The point in using a macro is that it does not contain any "create volatile table" commands. If it did then you would hit the error that you mentioned.
The aim is to take the existing code that populates the VT's and put each of them into a single DML statement as a derived table.
Using a single VT as an example, assume that the current code is:
CREATE VOLATILE TABLE vt1 AS (SELECT col1,col2,col3 ,SUM(col5) AS TOT_COL5 FROM t1 WHERE col14 > 57 GROUP BY 1,2,3) WITH DATA PRIMARY INDEX (col2) ON COMMIT PRESERVE ROWS; SELECT * FROM vt1 WHERE TOT_COL5 BETWEEN 200 AND 300;
The SELECT from above could be written as the following:
SELECT * FROM (SELECT col1,col2,col3 ,SUM(col5) AS TOT_COL5 FROM t1 WHERE col14 > 57 GROUP BY 1,2,3) AS vt1 WHERE TOT_COL5 BETWEEN 200 AND 300;
Yes, I know this is a simple example, but I think it demonstrates the approach that I suggested.
Firstly, thanks for your feedback, very much appreciated. Please correct if I'm wrong, but it seems like the gist of your suggestion is to essentially combine all my temp tables into a single DML query. The problem I see with that is that I have 4 temp tables that all derive off the prior. For example, in my first table I define the 'dataset' while my other temp tables defining my "exclusions", then my last temp table combines it all and then executes it in a final query. I'm not saying it's impossible, but if I were to combine all of it, it would be horrendous (400+ lines of exclusion logic and multiple joins in each temp table).
Should I go the more SP route? I'm still trying to figure out how even exactly I would do this through a SP, because from what I've been reading around so far on other forums (StackOverflow, etc.), I feel like I just might not be using temp tables in the manner it was intended for. What are your thoughts?
I would expect that, since volatile tables exist only in spool, their performance wouldn't be much different from derived tables. The one thing that could really slow this down is using cursors. (See developer.teradata.com/blog/georgecoleman.) It sounds like you could create the VT's in a stored procedure and then in the same SP code the Select that combines all the inclusions and exclusions. The SP would then return a Dynamic Result Set to the user, with no cursors needed.
However, it is hard to imagine that you could not do the same using derived tables of derived tables (of derived tables?). If more than one subquery needs to reference the same derived table, then you could define that one in a WITH clause, aka common table expression.
One advantage of using a macro (or plain old set SQL) could be that the optimizer might decide to build some of these derived tables in parallel (concurrently). That would save some time. (Check the Explain to see if it does this.) One advantage of doing it in a SP could be that it would be easier for the next person (after you've gone on to fame and fortune!) to understand.
If your logic requires a sequence of processing such that:
-- build temp-table-1
-- use temp-table-1 as part of the build of temp-table-2
...and so on...
then you're correct that my original approach will not work.
You may be able to use a CTE (as suggested by @GJColeman ). Once defined then the rest of the query can reference the table.
With current releases of TD you can have multiple CTE's in a single query. This may help you, it depends on your exact set of dependencies.
If you end up going the SP route then your SP does something like:
drop temp table#1 (define a continue handler for 'table not exist' error code) drop temp table#2 (define a continue handler for 'table not exist' error code) drop temp table#3 (define a continue handler for 'table not exist' error code) drop temp table#4 (define a continue handler for 'table not exist' error code) create temp table#1 and populate create temp table#2 and populate (using temp table#1 if required) create temp table#3 and populate create temp table#4 and populate select from temp-table#1, temp-table#2, temp-table#3, temp-table#4...
If this SP is going to be called by multiple users then rather than using volatile tables I'd use global temp tables. It is what they are designed for. If you do this then the drop/create commands in the SP logic shown above change to DELETE and INSERT.
@GJColemanis correct that for the most part volatile tables (VT) can simply be 'converted' to derived tables without any performance penalty. There are two situations that I've come across where I've had to use volatile temp tables:
- they are big and the derived table is horribly skewed. In this case with a VT you can specify a Primary Index and therefore control the distribution.
- they are big and one/more column(s) subsequently used for joins and/or selection contains very skewed values. In this case with a VT you can collect stats on the column(s) which will (hopefully) alert the optimiser to that fact and avoid doing nasty things.
Hi @DaveWellman, @GJColeman
Thanks for the feedback, very much appreciated. I think from reading both suggestions, it's most logical at this point to de-construct my temp tables (temp table built on another temp table... etc., etc.) into derived tables within a single query. Throughout this thread, I've been trying to avoid that like the plague... but I guess it's my most logical, reasonable option for my purposes.
1. Using SP approach: down the line, this report I'm making will need to be a VIEW. Since I'm not calling tables from other databases in my query (nor will I in the future), this approach seems like an additional layer that could ultimately be avoided entirely if I was to just create a single query with the derived tables. Instead of creating, calling, inserting and returning the SP, a single query with derived tables seems more robust.
2. Using Macro approach: Correct me if I'm wrong, but I kind of see this approach similar to out-right creating a view since I can't just "plug" in my temp tables. It's functionality is limited and a SP could easily replace it, in which case refer to poitn 1.
Are my assumptions correct? Sorry for all the questions, learning the intricacies of TD as I go along. Thanks again!
I think that makes sense. Depending on the tool they are using, it can be easier for a user to select from a view than to call a procedure or exec a macro, and macro parameters are limited - e.g., they can't be table/view names. Also, a view can be joined to other things. (The main advantage of a macro is that it's plan can be buffered for the next caller, so it doesn't have to go through a lot of optimizer steps again.)
One warning though - a look around tells me that a with modifier clause (or common table expression) is now allowed in a view as of the 16.* Teradata releases. (It hadn't been allowed before that, except for "create recursive view" - but that's a different topic.) If you don't have 16.x yet, there may be plans to go to 16.10 or 16.20 in the near future.
One thing I thought really works well - divide and conqer on massively distributed system (I don't like 500 lines of code but I like things modular).
While, you can have N joins in a query or a composite view, I have not seen its performance really scale beyond handful joins (Specially when you are dealing with extremely large data sets with non-matching PI joining columns).
If you have these parameters, I think your best bet would be to use GTT (I know you don't necessarity like them but there are some advantages).
If the true outcome of your automation is fast processing, I would think that GTT might help (As you can design these GTT tables the way that suits your needs).
This can easily be put in an macro with simple DML operations.
This suggestion is no different than your worst fear of using GTT :-).
Best of all, SQL's in a macro are a transcation so you don't have to worry about transactional integrity that you will have to deal with within an SP.