We have a bunch of JCL on our mainframe including references to BTEQ scripts and I'm wondering if it would make more sense to import the code in the scripts stored on the mainfraime to new procedures on Teradata. The existing BTEQ scripts would change to only call the new procedure on Tdata.
With the one BTEQ script example I've seen, there are differences are in the way the comments are noted. BTEQ uses asterisks and SQLA procedures uses double-dashes, but ideally I could create an automated procedure to convert the syntax differences.
I'm thinking if the contents of the scripts were converted to procedures using an automated process, Tdata could compile the script and not have to parse the SQL from the mainframe so executing the code would be easier on the Tdata server.
Are there discussions on this kind of conversion I'm not finding?
I found this:
A file that is created, compiled and stored on theTeradata Database
Physical object in Database
no SQL parsing is required.
is there a way to compare the performance of a BTEQ statement executed on Tdata vs an explain statement of a stored procedure, or would the procedure actually have to run to determine its performance?
Thanks for your help,
The two forms of comments work in both places, so don't worry about that. Otherwise, it would not be easy to write a general-purpose bteq-script-to-SP converter. There are too many differences. BTEQ is designed to produce formatted output from queries and has some limited condition handling. Stored Procedures can return result sets, but formatting them in the various ways BTEQ can do it would take some work, and besides, their main purpose is to do conditional processing in the database.
If the scripts only manipulate data in the database and do not return data, then it would probably be easy to do them in stored procedures too. But something still has to CALL the stored procedure - something like, ummm, BTEQ. You could check out the BTEQ manual further, and also the SQL Stored Procedures and Embedded SQL manual, to learn more about the options.
Thanks for your response.
These BTEQ scripts are used for ETL purposes only, they copy data from the mainframe to one or more tables on Tdata, they are not used for end-user reporting or presentation purposes.
The scripts are executed periodically (different frequencies for different scripts, but many every two weeks) to create a source for reports on Teradata slightly aged from the live mainframe system.
Also, the idea is that the BTEQ script would remain on the mainframe and be called in the existing JCL job, but instead of containing BTEQ script passed through to the Tdata server for execution, it would contain only the call to the procedure.
In our existing SSIS packages for ETL to Tdata (via the Attunity drivers,) we try as much as possible to exec procedures or macros in the SQL tasks instead of embedding Tdata script in the ETL tool. This would be an equivalent approach for BTEQ script stored on the mainframe instead of Tdata.
If it is reading data on the mainframe, and that data is in EBCDIC form, then it would have to be translated to ASCII or UNICODE to be loaded from UNIX or Windows. (Teradata does that automagically when you run BTEQ or a load utility from the mainframe.) Also, there is no easy way for a stored procedure to read data from a file. In order to understand this, we would have to know more about the data.
BTEQ scripts can do many things, but for these scripts, they are not using TPT functions to transfer files in EBCDIC format, only to transfer rows already loaded in Teradata tables for post-processing because the JCL staff is unfamiliar with Tdata procedures so is post-processing on Tdata with calls to BTEQ files stored on the mainframe. I have never coded in JCL or BTEQ or COBOL, so please excuse any misunderstanding I have of these concepts. I have created MS SSIS packages loading Tdata tables with the Attunity drivers and created macros and procedures to call from SQL tasks within the packages instead of pasting Tdata script into the SSIS objects.
The JCL code says:
do stuff (including loading rows onto Tdata with TPT functions)
do more stuff
log into Tdata
execute a BTEQ script moving data from Tdata source tables to Tdata destination tables
log out of Tdata
do more stuff
The current BTEQ scripts I'm considering contain SQL to:
delete rows from a staging table on Tdata
insert into the staging table from other tables already populated on Tdata
delete rows from a reporting/final/target table on Tdata
insert rows from staging table on Tdata to target table on Tdata
I expect Tdata would have an easier time if the BTEQ scripts without fastload/TPT functions were changed to:
and move the Tdata script from the file on the mainframe to the db.usp_fillTargetTable procedure so the script doesn't have to get parsed by the server before execution and the procedure can be compiled.
Running the script from the mainframe takes over 4 hours. I couldn't run the equivalent script with my account from my personal database because it ran out of spool space, but the explain estimated it would run in 30 minutes.
In general, I would expect any SQL code to perform better when stored in a persistent object on the server than if it was read from an external file, but I'm not an expert on Tdata's procedure optimizations. I know stored procedures on SQL Server and Oracle include significant optimizations compared to executing code in pass-through queries which end up surprising the server.
Thanks again for your response. Hope it's more clear what's going on and what I'm trying to understand.
Oh, Okay. Yes, the deletes and insert-selects could be done in a stored procedure, and the BTEQ step could just call the SP. However, that would not improve run time very much. At best, it might eliminate parsing time, whicih is normally a very small fraction of the time to run a job like this. It would move that SQL to the database server, but the TPT scripts and source data are still on the mainframe.
If the problem is that this job is taking too long, then you might be able to optimize the SQL itself. The first suspicion I get from your description of the BTEQ SQL script is that it is deleting rows that are going to be inserted. This delete-and-insert could be replaced by a single SQL MERGE command, which can update a row if it's there and insert it if it's not.
On the other hand, maybe the rows being deleted are just not wanted any more. If there are many such rows, it may be more efficient to simply select the rows you want to keep, join them to the rows being inserted and insert that into a new empty table - this would be a single step, and it's the fastest thing you can do in Teradata. Then delete the old table and rename the new table.
So there may be possibilities for improvements. It would also help to look at the DBC.QryLog* views to see where this job is really spending its time.
Thanks for your response.
Are you saying there should be no performance difference between executing Tdata SQL from an external file on a mainframe compared to executing the same SQL compiled into a procedure? The processing engine doesn't support any optimizations for procedures such as compliation or creation or caching of execution plans such as on SQL Server? I'm seeing some references to caching execution plans for macros, but not procedures.
Here's a post I found for SQL Server, but again, I can only assume Tdata does something similar with its own procedures:
I'm in agreement with his statement:
You should take advantage of the processing power of the database server when possible because it offloads processing from your application.
but if Tdata's procedures don't offer equivalent advantages of stored and cached execution plans, then the advantage of storing the code on the server is more towards encapsulation.
Of course, even if Tdata doesn't store or cache execution plans for procedures, I can always hope they will in a future version and then the practice from other environments would pay off in Tdata as well.
I've been using Viewpoint to see where the query is stuck, but I can check dbc.qryLog as well. Can you recommend a resource for understanding how to modify slow query steps given an explain statement? I'm only familiar with adding statistics recommended by DIAGNOSTICS HELPSTATS.
Thanks again for your help.
Yes, creating a stored procedure includes compiling the SQL, but that's different from caching the plan - caching means it's stored in the optimizer's memory. But again, unless there is something wrong, parsing the SQL and generating a plan is a matter of seconds or fractions of seconds, which is nothing compared to a 4-hour run time.
The Database Administration manual has a long chapter on query logging. QryLogStepsV would show the long-running steps, and QryLogExplainV would document what the steps are doing. But I do not know of any documentation that tells you what to do about it. The best thing to do is contact your local Teradata consultant(s), who should be able to help with that and look for answers. Alternatively, you could post the Explain on this forum (as a separate topic) and see if anyone has any ideas. On the other hand, the problem could be obvious once you see the explain and where the time is spent vs. estimated.
Rewriting SQL can sometimes help, but as the years pass the Optimizer gets smarter about rewriting it internally. Normally the only two ways to affect how the plan is developed are collection of statistics and the choice of indexes, especially the primary index(es).
A REPLACE PROCEDURE in Teradata is converted to an Embedded SQL program in C and compiled into a Shared Object on Linux, which is stored in the database.
When you CALL the SP the 1st time it's materialized in the file system (and cached, of course) and you actually call an exported function of this .so file.
Thus control structures like IF/WHILE/etc. are higly optimized C-code, while any included SQL is not pre-compiled, but sent as-is to the DBMS and optimized. Of course the optimizer might cache the plan of an individual SQL (like any other possibly repeated request).
The difference in performance between submitting SQL from BTEQ vs. SP will be neglectible, unless there's a large number of quick SQL statement submitted and then the time for the roundtrips BTEQ-TD-BTEQ will sum up. But if you got a job which submits long-running statement, you will not notice any difference, maybe 2:00.0 vs. 2:00.1 seconds.
There's no OPTION (RECOMPILE) needed in Teradata :-)
OK, thanks for the clarification. I'm more familiar with databases that work better with SQL stored in persistent objects, but Tdata doesn't seem to work that way. The explain statement I ran was only for the SELECT portion of the INSERT, so the SELECT may have taken 30 minutes, but the INSERT could have taken the remaining 3.5 hours even with the SQL in a procedure.
Is there a way to run an EXPLAIN on a call to a procedure or more than one statement in SQLA? I couldn't get that to work but that's the time estimate I really wanted from the processing engine. I know there's a lot of information available to figure out what the processor is doing, but I'm having a hard time finding resources to help me understand what's relevant about the information and what to change in response. All I really know to do is add suggested statistics from running diagnostics and add indexes on fields used for joins and limiting criteria. I'm not finding as much support for tuning Tdata queries as SQL Server, for example.
Thanks again for your help.