Easing Into Using the New AutoStats Feature

Blog
The best minds from Teradata, our partners, and customers blog about whatever takes their fancy.
Teradata Employee

Everyone with a Teradata Database collects statistics.  No way around it.  Good query plans rely on it, and we’ve all gotten used to it.

But starting in Teradata Database 14.10 there’s some big changes.  Numerous enhancements to the optimizer, some important database changes, a new set of APIs, an added data dictionary table, and a brand new Viewpoint portlet all combine together to produce the Automated Statistics (AutoStats) feature.  Used fully, AutoStats can identify statistics the optimizer looked for and found missing, flag unused statistics that you no longer need to collect on, prioritize and execute the stats collection statements, and organize/run your statistics collection jobs in the most efficient way possible.

There’s a lot going on and a lot of choices involved.  So what’s the easiest way to start getting familiar with AutoStats?   And what’s required with the new DBQL logging options?  If you like to build up confidence in a new feature at your own pace, this blog posting offers a few simple steps designed to introduce you to key features and benefits AutoStats.  

Sources of Information

AutoStats leverages a variety of new statistics enhancements available in Teradata Database 14.10.  Many of these new capabilities can be used directly within customized statistics management routines that already exist at many customer sites.

Reference the orange book titled:  Teradata Database 14.10 Statistics Enhancements by Rama Krishna Korlapati for more information.  Major features discussed in this orange book include automatic downgrading to sampled statistics, automatic skipping of unnecessary recollections based on thresholds, tracking update-delete-insert counts, and collecting statistics on single table expressions.

However, to gain the full benefit of an automated closed-loop architecture that incorporates these enhancements, take advantage of the statistics collection management and tuning analysis opportunities of AutoStats.  For a solid understanding of this feature, read the orange book titled  “Automated Statistics Management” by Louis Burger and Eric Scheie.   It discusses what AutoStats does, how to use the Viewpoint portlet screens, and provides guidelines on using AutoStats successfully.  

Here in Developer Exchange, take a look at the article from Shrity dated May 2013 titled “Teradata Viewpoint 14.10 Release”.   It describes and illustrates the Viewpoint portlet that supports AutoStats that you will be using when you get to 14.10.  This Viewpoint portlet is named “Stats Manager”.   

This blog posting is intended to help you ease into using the new AutoStats feature.  Subsequent related postings will explore individual features of 14.10 statistics management, such as the Threshold option, and suggest ways you can use them outside of the AutoStats architecture.

Version 6 of the Statistics Histogram

One underpinning of the Teradata Database 14.10 enhancements to statistics collection is a new version of the statistics histogram.  Version 6 is necessary to support some of these new enhancements, including a system-derived sampling option for the statistics collection statement, and  for capturing insert, delete and update counts. 

In order to use Version 6, the system needs to be committed to not back down to any Teradata Database release earlier than 14.10. The DBS Control general field #65 NoDot0Backdown must be set to true to indicate the no back down decision.  This change does not require a restart.  

If you have turned on DBQL USECOUNT logging for a database (discussed in the next section), and no back down is committed, the Version 6 histogram will automatically be used the next time you recollect statistics for the tables within that database.

The database supports backward compatibility.  Statistics whose histograms use the previous versions (Version 1 to Version 5) can be imported into the system that supports Version 6. However, the Version 6 statistics cannot be imported to systems that support Version 5 or lower versions.  After migrating to Teradata 14.10, it’s a good idea to accelerate the recollection of current statistics.  The recollection will move your histograms to the new version, which in turn will support a wider use of the new enhancements.

New DBQL options  - What They Do and When to Use Them

AutoStats functionality relies on your making more information available to the database about how statistics are used.  Some of this information it digs up itself, such as which stats you have already collected, and which options you have specified on your current statistics collection statements.

But to provide the most interesting (and helpful) new functionality, such as determining which statistics you have missed collecting on, or no longer need, additional DBQL logging will be required.  While this extra DBQL logging is not absolutely necessary if you choose to use AutoStats only for collection and management purposes, things like the new re-collection threshold functionality will not work without DBQL USECOUNT logging taking place.  AutoStats promotes a “submit often” approach, where statistics collection jobs can be submitted frequently for collection, leaving it up to the 14.10 threshold logic to determine which statistics actually need to be refreshed at run time.

DBQL USECOUNT Logging

If you enable DBQL logging with the USECOUNT option, and then run an AutoStats Analyze job from the Stats Manager portlet, you can identify unused statistics.   Although you turn on USECOUNT by a BEGIN QUERY LOGGING statement, USECOUNT is not a normal query-based DBQL option, and it is not something you enable at the user level.  You turn USECOUNT on at the database level  in a separate logging statement from the DBQL logging targeted to users.   

BEGIN QUERY LOGGING WITH USECOUNT ON SandBoxDB;

Once you enable logging, USECOUNT tracks usage of all objects for that database.  USECOUNT is the new mechanism for turning on Object Use Count (OUC).  

One type of object tracking USECOUNT does has to do with statistics.  In 14.10 each individual statistic is considered an “object” and will have a row in a new table, DBC.ObjectUsage.   With USECOUNT on, the database counts how often statistics objects were used by the optimizer.  Whenever the optimizer uses a particular statistic, the access count in that row in the DBC.ObjectUsage table is incremented.   If a particular statistics has no usage over a period of time, routines will flag it as being unused, and subsequent executions of Analyze jobs scoped to that database will recommend de-activating it.

USECOUNT also tracks inserts, deletes and updates made to tables within the database you have selected.  This provides the extrapolation process with more accurate information on changes in table size, and also allows enforcement of the Threshold Change Percent limits that you can apply to individual stats collection statements starting in 14.10.  The AutoStats feature incorporates threshold-based information in determining the ranking, and therefore the submission priority, of the individual statistic collections within the scripts it builds.

Here are few tips when logging USECOUNT:

  1. Before you turn on USECOUNT, and in order for the optimizer to use the use counts, set the DBS Control field NoDot0Backdown to true.
  2. Keep USECOUNT logging on for databases whose tables are being analyzed in an ongoing basis.
  3. In general, it is good to keep USECOUNT logging on for all important databases.  In particular, USECOUNT logging is important for those databases where the SYSTEM change-based THRESHOLD is in effect. New optimizer statistics fields in DBS Control allow the user to enable and set system-level THRESHOLD defaults.   A future blog posting will discuss the new THRESHOLD functionality in more detail. 

DBQL STATSUSAGE and XMLPLAN Logging

These additional DBQL logging options create XML documents that identify which statistics were used, and ones that the optimizer looked for but did not find.  Contrary to USECOUNT, these two logging types should be turned on temporarily, only as needed in preparation for analysis.  Both provide input to Analyze jobs whose evaluation method has been specified as  “DBQL”.   Both logging options store the XML that they produce in the DBC.DBQLXMLTbl.

Each option controls the logging of separate but related cateogries of data.  However XMLPLAN is more granular, provides more detail, and comes with more overhead than STATSUSAGE logging.   

The difference is that STATSUSAGE logs the usage of existing statistics and recommendations for new statistics (that it found to be missing) into the DBQLXMLTbl without any relationship to the query steps.  The relationship to the query steps can only be made if XMLPLAN logging is also enabled.  XMLPLAN logs detailed step data in XML format.  If you enable XMLPLAN by itself without STATSUSAGE, there is no stats related data logged into the DBQLXMLTbl.  STATSUSAGE provides all the statistics recommendations, and if both are being logged, then those statistical recommendations can be attached to specific steps.   So if you plan to log XMLPLAN, log STATSUSAGE at the same time.   

If both types of logging are on, both options log into a single XML document for a given request.  But you must explicitly request logging for them, as the ALL option in DBQL will not include STATSUSAGE and XMLPLAN.   Logging for both are enabled at the user level, as is STEP or OBJECT logging. 

When you begin logging with these new DBQL options, take into consideration any logging that is already taking place.   In the statement below, although SQL and OBJECTs are not used by AutoStats, they may be included in the same logging statement with STATSUSAGE and XMLPLAN.

BEGIN QUERY LOGGING WITH SQL, OBJECTS, STATSUSAGE, XMLPLAN LIMIT SQLTEXT=0 ON USER1;

The orange book Teradata Database 14.10 Statistics Enhancements provides an example of the XML output captured by DBQL STATSUSAGE logging.  See page 65-67.

Here are few tips when logging STATSUSAGE and XMLPLAN:

  1. For first time tuning or analysis, or after changes to the database, enable both XMLPLAN and STATSUSAGE options for those users who access the affected tables.
  2. Log with both options for at least one week in advance of running an Analyze job, or however long it takes to get good representation for the different queries.
  3. Consider using the more detailed XMLPLAN only for focused tuning, or briefly for new applications, as it incurs some overhead and requires more space.
  4. Turn off logging of both options during non-tuning periods of time or when no Analyze jobs on affected data are being planned in the near future.
  5. Keep the scope of logging to just the users accessing the databases and applications undergoing analysis and tuning.

Note:  You can enable DBQL logging for STATSUSAGE and XMLPLAN based on account or user, or whatever DBQL logging conventions currently exist at your site.   Keep in mind the other DBQL logging rules already in place.

Workload Management and AutoStats

Workload management for the AutoStats activities can be established using the following steps:

  1. Create a user for Stats Manager tasks.
  2. Add this user in the Teradata Systems Admin portlet under Stats Manager Data Collector.  All tasks issues by Stats Manager will now execute under that user.  See Chapter 16  DBS Security in the orange book:  Automated Statistics Management.
  3. Create a TASM/TIWM workload with Request Source classification on this user, and give it a moderate, or moderately high priority.  This will allow the basic Viewpoint and AutoStats tasks to be executed at a reasonable priority. It is recommended that no throttle be used for requests in this workload.  See Chapter 17 TASM Workload Definitions in the orange book:  Automated Statistics Management.
  4. Create a second workload using the same user classification, but also classify on queryband so that all of the Analze and Collect jobs will do their work within this workload.  Give it a low priority (Timeshare Low, for example) to push the actual statistics collection and analysis work into the background.  You may also want to place a throttle on this workload to manage concurrency of Analyze and Collect jobs.   Place this workload higher than the first workload in the workload evaluation order.  The querybands to classify on are:

                 vpStatsJobType=COLLECT for the collect jobs and  vpStatsJobType=ANALYZE for the analyze jobs

  1. If you wish to break out the actual statistics collection statements into a separate workload, create a third workload, using the same classification as the second workload above, but add classification on statement type of COLLECT STATISTICS.  Place this workload highest in the workload evalution order, in relationship to the other two workloads supporting AutoStats activity. It is recommended that you add a throttle to this workload to manage concurrency of statistics collections jobs.  Having a third workload will allow you to prioritize statistics collection activity differently from the analysis portion of the work, if you wish to do that.  

Becoming Familiar with AutoStats a Step at a Tme

Because AutoStats jobs can incorporate just a single table’s statistics or be an umbrella for  all of your databases, you want to keep careful track of how you scope your initial jobs.  Pick your boundaries and start off small. 

Initial steps to take when first starting out with AutoStats include:

  1. Read the orange book Automated Statistics Management  and understand how it works.
  2. Select a small, non-critical database with just a few tables, perhaps in your sandbox area, for your initial AutoStats interactions.
  3. Try to pick a database that represents a new application area that has no current statistics, or a database with some level of statistics, but whose statistics are not well tuned.
  4. Set up DBQL logging options so that new logging options are focused on the database tables you have scoped.
    1. Turn on DBQL USECOUNT logging just for that one database.
    2. Turn on DBQL logging that includes STATSUSAGE AND XMLPLAN only for users who access that database.
  5. Run the application’s queries against the database for several days or even a week.
  6. Enable the Viewpoint Stats Manager data collector in Admin->Teradata Systems portlet.  This enables the Stats Manager portlet.
  7. To run Stats Manger collect and analyze jobs, the TDStats database in DBC must have permission to collect statistics on the objects in the job’s scope.  For example: 
GRANT STATISTICS ON <database> TO TDStats;
  1. Select the database in the Statistics tab within the Viewpoint Stats Manager portlet and select the Automate command, in order to bring the existing statistics in that database under the control of AutoStats.   
  2. Create a new Analyze job
    1. Pick the DBQL option and limit queries by log date to incorporate just the last week when you were focused on logging that database.
    2. Select Require Review before applying recommendations.
  3. Run the Analyze job and review all recommendations and approve them.
  4. Create a new Collect job that incorporates your chosen database tables; run the Collect job.
    1. Specify Automatically generate the collect list (the default).
  5.  Preview the list of statistics to be collected before the collection job runs.

Starting off simply with AutoStats allows you to get familiar with the automated functionality and job scheduling capabilities, as well as the Stats Manager portlet screens.  Over time, you can give control over statistics analysis and collection to AutoStats for additional databases. 

69 Comments
Teradata Employee

I don't see anything problematic with your evaluation order.   You  have the workloads with the most restrictive classifications first, which is good. The only way that an Analyze job could fall through to the H workload that I can think of is if for some reason query bands were not working

You could check whether or not the query bands are being recognized.   Find the analyze job in DBQLogTbl output.  There is a query band field.  If QueryBand in DBQLogTbl is blank for the Analyze job it means the query band value did not get picked up by the Analyze job. 

If you think the AutoStats feature is not working as you expect, please open an incident with the support center.

Thanks, -Carrie

Thank you very much Carrie for the suggestions.

I have verified and ANALYZE job , queryband name  populating correctly in the DBQLogtbl output.

Raised the incident with GSC on the issue and waiting for the root cause and resolution.

Regards,

Chandra

Teradata Employee

Hi Carrie,

one short question that came up from a customer. What happens if a table is being renamed and a new table is being created with the old name? Is the Stats Manager based on tableID and keeps the stats for the new table name (I guess this is the case) or is only the table name tracked?

Regards

Michael

Teradata Employee

You are correct.  Autostats uses tableIds to join it’s infrastructure (tdstats repository)  with the dictionary.

Thanks, -Carrie 

N/A

Hi Carrie,

Is it possible to lauch auto stats collect jobs using autostats APIs from shell wrapper bteq scripts.

I couldnt find detail info about   precollect,runcollect,runcollectreport procedures in orangebooks.

Thanks

Naga

Teradata Employee

Naga,

The AutoStats APIs are implemented as stored procedures and are documented in the Application Programming Reference in the chapter entitled “Automated Statistics Management API”.  Customers can call these stored procedures directly but the results will not be tracked as “Jobs” within the Viewpoint Stats Manager portlet.

Thanks, -Carrie

Hi Carrie,

We have successfully setup statsmanager portlet on our site and start implementing collect jobs. 

Can you advise on archiving the TDSTATS database ,statsmanager user setup.

I read the manual and it says above two are not part of dictionary backup and we need to add in our archive process in case any disaster recovery for restore and resync stats. Can we do weekly archive? 

For the clean up log tables  from TDSTATS , what will be the best practice to setup.

From the portlet , we can find duration from the jobs history view. Is there any API can provide the total duration of the specific job. TDSTATS.CommandsHistoryTbl and  tdstats.SelectStatsExecutionHistory gives by tables  wise. I am looking to find total duration of the jobs from any API / any table from tdstats repository. 

Thanks

Chandra

Hi Carrie,

      Can you please advise on my question when you get a chance.

     Thank you.

Regards

Chandra.

Teradata Employee

Chandra,

I have been trying to find someone in engineering who knows what the answer is.   When I am able to get a response, I will post it here.   Thank you for the reminder.

Best regards, -Carrie

Thank you very much Carrie for the help!!.

Regards

Chandra

Teradata Employee

Chandra,

Below are the responses I was able to get to your questions:

YOUR QUESTION:  Can you advise on archiving the TDSTATS database ,statsmanager user setup. I read the manual and it says above two are not part of dictionary backup and we need to add in our archive process in case any disaster recovery for restore and resync stats. Can we do weekly archive?

Viewpoint does not provide for archiving of metadata such as Stats Manager job definitions.  If the job is not needed anymore, it can be deleted through the portlet.  Viewpoint will remove any collected data related to stats (which includes job history and analyze, automate, cleanup, and collect reports) according to the data retention policy set for the Stats Manager collector.  You can view this setting in the admin Monitored Systems portlet.

YOUR QUESTION:  For the clean up log tables  from TDSTATS , what will be the best practice to setup.

RESPONSE:  Periodic clean up of TDStats historical metadata can be enabled via Viewpoint and the suggested practice is to start with the default setting of purging results older than 30 days. See chapter 18 Managing Space for TDStats in the Automated Statistics Management Orange Book for more info.

YOUR QUESTION:  From the portlet , we can find duration from the jobs history view. Is there any API can provide the total duration of the specific job. TDSTATS.CommandsHistoryTbl and  tdstats.SelectStatsExecutionHistory gives by tables  wise. I am looking to find total duration of the jobs from any API / any table from tdstats repository.

RESPONSE:  Stats Manager defined Collect Job names are not explicitly stored in the TDStats repository which instead uses "RunIDs" to uniquely identify the execution of the RunCollect API on behalf of a given Viewpoint job run.  However, when defining Command Lists for a particular Collect Job, Stats Manager embeds the job name in the list name which in turn is stored in TDStats column CommandsListTbl.CmdListName. The value stored in this column can be used to filter rows in CommandsHistoryTbl for given job or to call API SelectStatsExecutionHistory.  You are correct that TDStats command history data (including start and end collection times) is stored at the table level but you can aggregate on RunID to compute the total time for a run. Although the summed times from TDStats may not exactly match the separately tracked Viewpoint job time, they should be roughly the same.

For example, given a Stats Manager Collect Job named 'CollectTPCH', the following query would sum the collection times for each run:

SELECT RunID, SUM(TimeStamp_Diff_Seconds(C.StartTimeStamp,C.EndTimeStamp)) AS TotalRunTimeInSecs

FROM TDStats.CommandsHistoryTbl C, TDStats.CommandsListTbl L

WHERE C.CmdListID = L.CmdListID AND

L.CmdListName = 'com.teradata.viewpoint.statsmgmt.CLFCollectJob-CollectTPCH'

GROUP BY RunID

ORDER BY RunID;

note: TimeStamp_Diff_Seconds is a UDF whose definition was posted by dnoeth on Teradata Forums

Thanks, -Carrie

Thank you very much Carrie for taking time to follow-up and answer my questions.

Regards

Chandra

Hi Carrie,

I have automated stats on one of our core database. Completed two runs and it is collecting complete stats and not skipping any stats. To recollect/skip stats, syschange threshold depends on USECOUNT logging on specific db for evaluation. I have enabled the USECOUNT logging now and objectusage table is populating now with the STA/UDI counts.

How many days/weeks  normally will take to build good history info on USECOUNT/objectusage   to be able to use by system change threshold option . One of our table is pretty big and runtime is higher now and hope the USECOUNT logging will help runtime.

Chandra

Hi Carrie,

We have primary and DR production systems. I am looking to see any options on copy stats from one prod system to another system since same tables exists both and in sync. Can you please advise what would be the ideal method in implementing it. It saves lot of resources and stats process easy.

Regards

Chandra

Teradata Employee

Response to the comment from 2 days ago:

Chandra,

The amount of time before skipping will be considered by the optimizer depends.  The optimizer usually looks for at least 5 history records in the statistics histogram, but if the pattern of change with updates over time is erratic, and the optimizer does not find a predictable pattern for the metrics within the history records that it has, it may not choose to skip recollections at all. 

If you are on an older release of Teradata, consider upgrading as there have been some fixes over the last couple of years in this area.   If you not getting skipping when you believe you should, open an incident with the support center and they can take a look and help resolve your situation.

Response to the comment from today:

You can do a SHOW STATISTICS VALUES ON table-name and the output will be all the stats collected on the table including their histograms and history records.    It will be in SQL format, so you can take the output and use it as input on the DR system, as long as the tables are in sync.  Make sure to route the output to a file, or just copy and paste it from the online screen that displays it.

To do SHOW STATS VALUES, you require SELECT rights on the table.

Thanks, -Carrie

Thank you very much Carrie!!.

We are currently on Release 15.00.04.05 .

We recenlty started stats collection using statsmanager on prod system and started with one database to check the runtimes and how the history records work with USECOUNT for recollection. I just enabled USECOUNT this week and before that we had two full collection stats run on the same database without enabling USECOUNT. I will wait 3 to 4 weeks to develop history records and will compare the runtimes . If needed will raise incident with GSC.

Since DR system in sync, do we still need to enable USECOUNT logging just in case ? We want to copy stats to DR from primary and want to take advantage . Thank you for the help.

Regards

Chandra

Teradata Employee

Chandra,

USECOUNT logging is not just used for skipping stats.  It is also used for statistics extrapolation.   You will not get skipping if USECOUNT logging is off, and the extrapolations if stats are stale will not be as optimal. 

Thanks, -Carrie

Thank you Carrie!!

I will go ahead and  enable USECOUNT logging on all required permanent databases to make use of it on DR system as well.

Our latest collect job run after we enabled USECOUNT logging significantly improved the total runtime and skipped few stats. Good to know thar USECOUNT logging also will be used for stats extrapolation when eanbled. I observed the run time for the very large table collection run time drastically reduced even the statistics collection not skipped. This helped to reduce the overall collection job on the DB runtime.

Regards

Chandra

Fan

Hello Carrie,

 

I am exploring to move STATS collection process from an in-house tool to TD autostats and benefit from the new features offered by TD autostats. I see a challenge in using TD Autostats when Tables are recreated and loaded by ETL Jobs. As per TD manual, below procedure should be followed to preserve and refresh TD Autostats when Tables are recreated by ETL Jobs. This approach requires ETL job changes and this will be difficult when # of recreated tables are very high. Is there easy way to preserve Autostats without making changes to ETL Jobs ? Can Autostats(metadata) be copied between systems (in a multi-system env) once initial Autostats setup is complete on one system ? Please let me know.

 

Example of Preserving TDStats Data During Table Recreation

 

A typical ETL flow involving a previously automated query table that is reincarnated by renaming a shadow table requires the following SQL statements and API calls.

 

Initial setup steps (performed once):

 

1 CREATE TABLE QueryTable (….);

2 Load or INSERT data into QueryTable

3 COLLECT STATISTICS …. ON QueryTable; /* user defined stats on columns and indexes */

4 CALL AutomateStats('Db1','QueryTable',…);

5 CALL PreserveAfterRecreate('Db1','QueryTable',NULL,:NumStatsToPreserve);

 

Note: Step number 5 is required to ensure that TDStats metadata for QueryTable will survive the table recreation.

 

Table recreation steps (periodically repeated):

 

1 CREATE TABLE ShadowTable AS Db1.QueryTable WITH DATA AND STATS;

2 Load additional data into ShadowTable using bulk load methods

3 COLLECT SUMMARY STATISTICS ON ShadowTable; /* refresh summary-only stats */

4 DROP TABLE QueryTable;

5 RENAME TABLE ShadowTable AS QueryTable;

6 CALL ResyncStats('Db1','QueryTable',:ResyncId,:NumRepaired);

 

Note: Step number 6 is required to ensure that TDStats metadata for QueryTable will survive the table recreation.

 

Thanks,

Sudhir