Everyone with a Teradata Database collects statistics. No way around it. Good query plans rely on it, and we’ve all gotten used to it.
But starting in Teradata Database 14.10 there’s some big changes. Numerous enhancements to the optimizer, some important database changes, a new set of APIs, an added data dictionary table, and a brand new Viewpoint portlet all combine together to produce the Automated Statistics (AutoStats) feature. Used fully, AutoStats can identify statistics the optimizer looked for and found missing, flag unused statistics that you no longer need to collect on, prioritize and execute the stats collection statements, and organize/run your statistics collection jobs in the most efficient way possible.
There’s a lot going on and a lot of choices involved. So what’s the easiest way to start getting familiar with AutoStats? And what’s required with the new DBQL logging options? If you like to build up confidence in a new feature at your own pace, this blog posting offers a few simple steps designed to introduce you to key features and benefits AutoStats.
AutoStats leverages a variety of new statistics enhancements available in Teradata Database 14.10. Many of these new capabilities can be used directly within customized statistics management routines that already exist at many customer sites.
Reference the orange book titled: Teradata Database 14.10 Statistics Enhancements by Rama Krishna Korlapati for more information. Major features discussed in this orange book include automatic downgrading to sampled statistics, automatic skipping of unnecessary recollections based on thresholds, tracking update-delete-insert counts, and collecting statistics on single table expressions.
However, to gain the full benefit of an automated closed-loop architecture that incorporates these enhancements, take advantage of the statistics collection management and tuning analysis opportunities of AutoStats. For a solid understanding of this feature, read the orange book titled “Automated Statistics Management” by Louis Burger and Eric Scheie. It discusses what AutoStats does, how to use the Viewpoint portlet screens, and provides guidelines on using AutoStats successfully.
Here in Developer Exchange, take a look at the article from Shrity dated May 2013 titled “Teradata Viewpoint 14.10 Release”. It describes and illustrates the Viewpoint portlet that supports AutoStats that you will be using when you get to 14.10. This Viewpoint portlet is named “Stats Manager”.
This blog posting is intended to help you ease into using the new AutoStats feature. Subsequent related postings will explore individual features of 14.10 statistics management, such as the Threshold option, and suggest ways you can use them outside of the AutoStats architecture.
One underpinning of the Teradata Database 14.10 enhancements to statistics collection is a new version of the statistics histogram. Version 6 is necessary to support some of these new enhancements, including a system-derived sampling option for the statistics collection statement, and for capturing insert, delete and update counts.
In order to use Version 6, the system needs to be committed to not back down to any Teradata Database release earlier than 14.10. The DBS Control general field #65 NoDot0Backdown must be set to true to indicate the no back down decision. This change does not require a restart.
If you have turned on DBQL USECOUNT logging for a database (discussed in the next section), and no back down is committed, the Version 6 histogram will automatically be used the next time you recollect statistics for the tables within that database.
The database supports backward compatibility. Statistics whose histograms use the previous versions (Version 1 to Version 5) can be imported into the system that supports Version 6. However, the Version 6 statistics cannot be imported to systems that support Version 5 or lower versions. After migrating to Teradata 14.10, it’s a good idea to accelerate the recollection of current statistics. The recollection will move your histograms to the new version, which in turn will support a wider use of the new enhancements.
AutoStats functionality relies on your making more information available to the database about how statistics are used. Some of this information it digs up itself, such as which stats you have already collected, and which options you have specified on your current statistics collection statements.
But to provide the most interesting (and helpful) new functionality, such as determining which statistics you have missed collecting on, or no longer need, additional DBQL logging will be required. While this extra DBQL logging is not absolutely necessary if you choose to use AutoStats only for collection and management purposes, things like the new re-collection threshold functionality will not work without DBQL USECOUNT logging taking place. AutoStats promotes a “submit often” approach, where statistics collection jobs can be submitted frequently for collection, leaving it up to the 14.10 threshold logic to determine which statistics actually need to be refreshed at run time.
If you enable DBQL logging with the USECOUNT option, and then run an AutoStats Analyze job from the Stats Manager portlet, you can identify unused statistics. Although you turn on USECOUNT by a BEGIN QUERY LOGGING statement, USECOUNT is not a normal query-based DBQL option, and it is not something you enable at the user level. You turn USECOUNT on at the database level in a separate logging statement from the DBQL logging targeted to users.
BEGIN QUERY LOGGING WITH USECOUNT ON SandBoxDB;
Once you enable logging, USECOUNT tracks usage of all objects for that database. USECOUNT is the new mechanism for turning on Object Use Count (OUC).
One type of object tracking USECOUNT does has to do with statistics. In 14.10 each individual statistic is considered an “object” and will have a row in a new table, DBC.ObjectUsage. With USECOUNT on, the database counts how often statistics objects were used by the optimizer. Whenever the optimizer uses a particular statistic, the access count in that row in the DBC.ObjectUsage table is incremented. If a particular statistics has no usage over a period of time, routines will flag it as being unused, and subsequent executions of Analyze jobs scoped to that database will recommend de-activating it.
USECOUNT also tracks inserts, deletes and updates made to tables within the database you have selected. This provides the extrapolation process with more accurate information on changes in table size, and also allows enforcement of the Threshold Change Percent limits that you can apply to individual stats collection statements starting in 14.10. The AutoStats feature incorporates threshold-based information in determining the ranking, and therefore the submission priority, of the individual statistic collections within the scripts it builds.
Here are few tips when logging USECOUNT:
These additional DBQL logging options create XML documents that identify which statistics were used, and ones that the optimizer looked for but did not find. Contrary to USECOUNT, these two logging types should be turned on temporarily, only as needed in preparation for analysis. Both provide input to Analyze jobs whose evaluation method has been specified as “DBQL”. Both logging options store the XML that they produce in the DBC.DBQLXMLTbl.
Each option controls the logging of separate but related cateogries of data. However XMLPLAN is more granular, provides more detail, and comes with more overhead than STATSUSAGE logging.
The difference is that STATSUSAGE logs the usage of existing statistics and recommendations for new statistics (that it found to be missing) into the DBQLXMLTbl without any relationship to the query steps. The relationship to the query steps can only be made if XMLPLAN logging is also enabled. XMLPLAN logs detailed step data in XML format. If you enable XMLPLAN by itself without STATSUSAGE, there is no stats related data logged into the DBQLXMLTbl. STATSUSAGE provides all the statistics recommendations, and if both are being logged, then those statistical recommendations can be attached to specific steps. So if you plan to log XMLPLAN, log STATSUSAGE at the same time.
If both types of logging are on, both options log into a single XML document for a given request. But you must explicitly request logging for them, as the ALL option in DBQL will not include STATSUSAGE and XMLPLAN. Logging for both are enabled at the user level, as is STEP or OBJECT logging.
When you begin logging with these new DBQL options, take into consideration any logging that is already taking place. In the statement below, although SQL and OBJECTs are not used by AutoStats, they may be included in the same logging statement with STATSUSAGE and XMLPLAN.
BEGIN QUERY LOGGING WITH SQL, OBJECTS, STATSUSAGE, XMLPLAN LIMIT SQLTEXT=0 ON USER1;
The orange book Teradata Database 14.10 Statistics Enhancements provides an example of the XML output captured by DBQL STATSUSAGE logging. See page 65-67.
Here are few tips when logging STATSUSAGE and XMLPLAN:
Note: You can enable DBQL logging for STATSUSAGE and XMLPLAN based on account or user, or whatever DBQL logging conventions currently exist at your site. Keep in mind the other DBQL logging rules already in place.
Workload management for the AutoStats activities can be established using the following steps:
vpStatsJobType=COLLECT for the collect jobs and vpStatsJobType=ANALYZE for the analyze jobs
Because AutoStats jobs can incorporate just a single table’s statistics or be an umbrella for all of your databases, you want to keep careful track of how you scope your initial jobs. Pick your boundaries and start off small.
Initial steps to take when first starting out with AutoStats include:
GRANT STATISTICS ON <database> TO TDStats;
Starting off simply with AutoStats allows you to get familiar with the automated functionality and job scheduling capabilities, as well as the Stats Manager portlet screens. Over time, you can give control over statistics analysis and collection to AutoStats for additional databases.