Easing Into Using the New AutoStats Feature

Blog
The best minds from Teradata, our partners, and customers blog about whatever takes their fancy.
Teradata Employee

Everyone with a Teradata Database collects statistics.  No way around it.  Good query plans rely on it, and we’ve all gotten used to it.

But starting in Teradata Database 14.10 there’s some big changes.  Numerous enhancements to the optimizer, some important database changes, a new set of APIs, an added data dictionary table, and a brand new Viewpoint portlet all combine together to produce the Automated Statistics (AutoStats) feature.  Used fully, AutoStats can identify statistics the optimizer looked for and found missing, flag unused statistics that you no longer need to collect on, prioritize and execute the stats collection statements, and organize/run your statistics collection jobs in the most efficient way possible.

There’s a lot going on and a lot of choices involved.  So what’s the easiest way to start getting familiar with AutoStats?   And what’s required with the new DBQL logging options?  If you like to build up confidence in a new feature at your own pace, this blog posting offers a few simple steps designed to introduce you to key features and benefits AutoStats.  

Sources of Information

AutoStats leverages a variety of new statistics enhancements available in Teradata Database 14.10.  Many of these new capabilities can be used directly within customized statistics management routines that already exist at many customer sites.

Reference the orange book titled:  Teradata Database 14.10 Statistics Enhancements by Rama Krishna Korlapati for more information.  Major features discussed in this orange book include automatic downgrading to sampled statistics, automatic skipping of unnecessary recollections based on thresholds, tracking update-delete-insert counts, and collecting statistics on single table expressions.

However, to gain the full benefit of an automated closed-loop architecture that incorporates these enhancements, take advantage of the statistics collection management and tuning analysis opportunities of AutoStats.  For a solid understanding of this feature, read the orange book titled  “Automated Statistics Management” by Louis Burger and Eric Scheie.   It discusses what AutoStats does, how to use the Viewpoint portlet screens, and provides guidelines on using AutoStats successfully.  

Here in Developer Exchange, take a look at the article from Shrity dated May 2013 titled “Teradata Viewpoint 14.10 Release”.   It describes and illustrates the Viewpoint portlet that supports AutoStats that you will be using when you get to 14.10.  This Viewpoint portlet is named “Stats Manager”.   

This blog posting is intended to help you ease into using the new AutoStats feature.  Subsequent related postings will explore individual features of 14.10 statistics management, such as the Threshold option, and suggest ways you can use them outside of the AutoStats architecture.

Version 6 of the Statistics Histogram

One underpinning of the Teradata Database 14.10 enhancements to statistics collection is a new version of the statistics histogram.  Version 6 is necessary to support some of these new enhancements, including a system-derived sampling option for the statistics collection statement, and  for capturing insert, delete and update counts. 

In order to use Version 6, the system needs to be committed to not back down to any Teradata Database release earlier than 14.10. The DBS Control general field #65 NoDot0Backdown must be set to true to indicate the no back down decision.  This change does not require a restart.  

If you have turned on DBQL USECOUNT logging for a database (discussed in the next section), and no back down is committed, the Version 6 histogram will automatically be used the next time you recollect statistics for the tables within that database.

The database supports backward compatibility.  Statistics whose histograms use the previous versions (Version 1 to Version 5) can be imported into the system that supports Version 6. However, the Version 6 statistics cannot be imported to systems that support Version 5 or lower versions.  After migrating to Teradata 14.10, it’s a good idea to accelerate the recollection of current statistics.  The recollection will move your histograms to the new version, which in turn will support a wider use of the new enhancements.

New DBQL options  - What They Do and When to Use Them

AutoStats functionality relies on your making more information available to the database about how statistics are used.  Some of this information it digs up itself, such as which stats you have already collected, and which options you have specified on your current statistics collection statements.

But to provide the most interesting (and helpful) new functionality, such as determining which statistics you have missed collecting on, or no longer need, additional DBQL logging will be required.  While this extra DBQL logging is not absolutely necessary if you choose to use AutoStats only for collection and management purposes, things like the new re-collection threshold functionality will not work without DBQL USECOUNT logging taking place.  AutoStats promotes a “submit often” approach, where statistics collection jobs can be submitted frequently for collection, leaving it up to the 14.10 threshold logic to determine which statistics actually need to be refreshed at run time.

DBQL USECOUNT Logging

If you enable DBQL logging with the USECOUNT option, and then run an AutoStats Analyze job from the Stats Manager portlet, you can identify unused statistics.   Although you turn on USECOUNT by a BEGIN QUERY LOGGING statement, USECOUNT is not a normal query-based DBQL option, and it is not something you enable at the user level.  You turn USECOUNT on at the database level  in a separate logging statement from the DBQL logging targeted to users.   

BEGIN QUERY LOGGING WITH USECOUNT ON SandBoxDB;

Once you enable logging, USECOUNT tracks usage of all objects for that database.  USECOUNT is the new mechanism for turning on Object Use Count (OUC).  

One type of object tracking USECOUNT does has to do with statistics.  In 14.10 each individual statistic is considered an “object” and will have a row in a new table, DBC.ObjectUsage.   With USECOUNT on, the database counts how often statistics objects were used by the optimizer.  Whenever the optimizer uses a particular statistic, the access count in that row in the DBC.ObjectUsage table is incremented.   If a particular statistics has no usage over a period of time, routines will flag it as being unused, and subsequent executions of Analyze jobs scoped to that database will recommend de-activating it.

USECOUNT also tracks inserts, deletes and updates made to tables within the database you have selected.  This provides the extrapolation process with more accurate information on changes in table size, and also allows enforcement of the Threshold Change Percent limits that you can apply to individual stats collection statements starting in 14.10.  The AutoStats feature incorporates threshold-based information in determining the ranking, and therefore the submission priority, of the individual statistic collections within the scripts it builds.

Here are few tips when logging USECOUNT:

  1. Before you turn on USECOUNT, and in order for the optimizer to use the use counts, set the DBS Control field NoDot0Backdown to true.
  2. Keep USECOUNT logging on for databases whose tables are being analyzed in an ongoing basis.
  3. In general, it is good to keep USECOUNT logging on for all important databases.  In particular, USECOUNT logging is important for those databases where the SYSTEM change-based THRESHOLD is in effect. New optimizer statistics fields in DBS Control allow the user to enable and set system-level THRESHOLD defaults.   A future blog posting will discuss the new THRESHOLD functionality in more detail. 

DBQL STATSUSAGE and XMLPLAN Logging

These additional DBQL logging options create XML documents that identify which statistics were used, and ones that the optimizer looked for but did not find.  Contrary to USECOUNT, these two logging types should be turned on temporarily, only as needed in preparation for analysis.  Both provide input to Analyze jobs whose evaluation method has been specified as  “DBQL”.   Both logging options store the XML that they produce in the DBC.DBQLXMLTbl.

Each option controls the logging of separate but related cateogries of data.  However XMLPLAN is more granular, provides more detail, and comes with more overhead than STATSUSAGE logging.   

The difference is that STATSUSAGE logs the usage of existing statistics and recommendations for new statistics (that it found to be missing) into the DBQLXMLTbl without any relationship to the query steps.  The relationship to the query steps can only be made if XMLPLAN logging is also enabled.  XMLPLAN logs detailed step data in XML format.  If you enable XMLPLAN by itself without STATSUSAGE, there is no stats related data logged into the DBQLXMLTbl.  STATSUSAGE provides all the statistics recommendations, and if both are being logged, then those statistical recommendations can be attached to specific steps.   So if you plan to log XMLPLAN, log STATSUSAGE at the same time.   

If both types of logging are on, both options log into a single XML document for a given request.  But you must explicitly request logging for them, as the ALL option in DBQL will not include STATSUSAGE and XMLPLAN.   Logging for both are enabled at the user level, as is STEP or OBJECT logging. 

When you begin logging with these new DBQL options, take into consideration any logging that is already taking place.   In the statement below, although SQL and OBJECTs are not used by AutoStats, they may be included in the same logging statement with STATSUSAGE and XMLPLAN.

BEGIN QUERY LOGGING WITH SQL, OBJECTS, STATSUSAGE, XMLPLAN LIMIT SQLTEXT=0 ON USER1;

The orange book Teradata Database 14.10 Statistics Enhancements provides an example of the XML output captured by DBQL STATSUSAGE logging.  See page 65-67.

Here are few tips when logging STATSUSAGE and XMLPLAN:

  1. For first time tuning or analysis, or after changes to the database, enable both XMLPLAN and STATSUSAGE options for those users who access the affected tables.
  2. Log with both options for at least one week in advance of running an Analyze job, or however long it takes to get good representation for the different queries.
  3. Consider using the more detailed XMLPLAN only for focused tuning, or briefly for new applications, as it incurs some overhead and requires more space.
  4. Turn off logging of both options during non-tuning periods of time or when no Analyze jobs on affected data are being planned in the near future.
  5. Keep the scope of logging to just the users accessing the databases and applications undergoing analysis and tuning.

Note:  You can enable DBQL logging for STATSUSAGE and XMLPLAN based on account or user, or whatever DBQL logging conventions currently exist at your site.   Keep in mind the other DBQL logging rules already in place.

Workload Management and AutoStats

Workload management for the AutoStats activities can be established using the following steps:

  1. Create a user for Stats Manager tasks.
  2. Add this user in the Teradata Systems Admin portlet under Stats Manager Data Collector.  All tasks issues by Stats Manager will now execute under that user.  See Chapter 16  DBS Security in the orange book:  Automated Statistics Management.
  3. Create a TASM/TIWM workload with Request Source classification on this user, and give it a moderate, or moderately high priority.  This will allow the basic Viewpoint and AutoStats tasks to be executed at a reasonable priority. It is recommended that no throttle be used for requests in this workload.  See Chapter 17 TASM Workload Definitions in the orange book:  Automated Statistics Management.
  4. Create a second workload using the same user classification, but also classify on queryband so that all of the Analze and Collect jobs will do their work within this workload.  Give it a low priority (Timeshare Low, for example) to push the actual statistics collection and analysis work into the background.  You may also want to place a throttle on this workload to manage concurrency of Analyze and Collect jobs.   Place this workload higher than the first workload in the workload evaluation order.  The querybands to classify on are:

                 vpStatsJobType=COLLECT for the collect jobs and  vpStatsJobType=ANALYZE for the analyze jobs

  1. If you wish to break out the actual statistics collection statements into a separate workload, create a third workload, using the same classification as the second workload above, but add classification on statement type of COLLECT STATISTICS.  Place this workload highest in the workload evalution order, in relationship to the other two workloads supporting AutoStats activity. It is recommended that you add a throttle to this workload to manage concurrency of statistics collections jobs.  Having a third workload will allow you to prioritize statistics collection activity differently from the analysis portion of the work, if you wish to do that.  

Becoming Familiar with AutoStats a Step at a Tme

Because AutoStats jobs can incorporate just a single table’s statistics or be an umbrella for  all of your databases, you want to keep careful track of how you scope your initial jobs.  Pick your boundaries and start off small. 

Initial steps to take when first starting out with AutoStats include:

  1. Read the orange book Automated Statistics Management  and understand how it works.
  2. Select a small, non-critical database with just a few tables, perhaps in your sandbox area, for your initial AutoStats interactions.
  3. Try to pick a database that represents a new application area that has no current statistics, or a database with some level of statistics, but whose statistics are not well tuned.
  4. Set up DBQL logging options so that new logging options are focused on the database tables you have scoped.
    1. Turn on DBQL USECOUNT logging just for that one database.
    2. Turn on DBQL logging that includes STATSUSAGE AND XMLPLAN only for users who access that database.
  5. Run the application’s queries against the database for several days or even a week.
  6. Enable the Viewpoint Stats Manager data collector in Admin->Teradata Systems portlet.  This enables the Stats Manager portlet.
  7. To run Stats Manger collect and analyze jobs, the TDStats database in DBC must have permission to collect statistics on the objects in the job’s scope.  For example: 
GRANT STATISTICS ON <database> TO TDStats;
  1. Select the database in the Statistics tab within the Viewpoint Stats Manager portlet and select the Automate command, in order to bring the existing statistics in that database under the control of AutoStats.   
  2. Create a new Analyze job
    1. Pick the DBQL option and limit queries by log date to incorporate just the last week when you were focused on logging that database.
    2. Select Require Review before applying recommendations.
  3. Run the Analyze job and review all recommendations and approve them.
  4. Create a new Collect job that incorporates your chosen database tables; run the Collect job.
    1. Specify Automatically generate the collect list (the default).
  5.  Preview the list of statistics to be collected before the collection job runs.

Starting off simply with AutoStats allows you to get familiar with the automated functionality and job scheduling capabilities, as well as the Stats Manager portlet screens.  Over time, you can give control over statistics analysis and collection to AutoStats for additional databases. 

69 Comments
Supporter

Hi Carrie,

thanks for sharing this!

I was not able to find the orange book "Automated Statistics Management" by searching the tays knowledge search. Can you give me a hint where to find this?

Ulrich

Teradata Employee

Hi Ulrich,

There are two ways I know of to find orange books on Teradata At Your Service site.  It sounds like you tried the first method, by writing “orange books” in the Search box.

You can also look on the left side of the screen under “Navigate” and under the “Products” category click on the “TKO Orange Books” option.

If neither of them work, contact me offline and I will look into it.   Automated Statistics Management was approved for publication back in June of this year, and is available on my Teradata respository.

Thanks, -Carrie

Enthusiast

Hi Carrie,

What are the privileges(access rights) required to Collect Statistics on any table? We have TD 12 installed.

Grant statistics on <dbname> to <rolename>. i tried this, but i couldn't do it. Please help.

Dinesh

Enthusiast

Hi  Carie,

 I also could not find it under “Navigate” and under the “Products” category click on the “TKO Orange Books” option. 

My Email address is  Akleema_talukder@bcbstx.com. Will you please  attach me this document?

Thank you very much.

Akleema

Teradata Employee

Dinesh,

Prior to Teradata 13.0 a user must have INDEX or DROP privileges on the table in order to collect statistics, if the user is other than DBC.   If you are using  INDEX privileges, they cannot be granted at the database level, only the table level.

Once you are on 13.0 you can begin to use the new STATISTICS privilege.  If you are using STATISTICS privileges you do not require DROP or INDEX privileges.  At that time you can grant on either a table or a database.  

Thanks, -Carrie

Teradata Employee

Akleema,

I was able to access this orange book just now on Teradata At Your Service.

This is what I did, see if this works for you:

From the Home page, click on the Knowledge Search tab.

In the search box put:  "Automated Statistics Management" and click SEARCH.

It brought that orange book right up.

Thanks, -Carrie

Enthusiast

@ulrich Were you able to get the orange book? If not let me know.

-Alpesh

Teradata Employee

It turns out that there is an issue with the Automated Statistics Management orange book avalability on Teradata at Your Service.  It is in the process of being addressed.

Thank you for your patience.  -Carrie

Teradata Employee

Hi carrie,

I am planning to use STATS MANAGER  for stats automation on two databases first. We are using TD14.10 that was upgraded from 14.00 to 14.10 last week. The DBS Control field NoDot0Backdown is false right now.

My question is -

If i enable the USECOUNT using DBQL query logging for one database without changing the NoDot0Backdown to true  then will it recommend for de-activate stats on tables which are not being used in this database ?.

can you please tell little about other impact by enable this parameter NoDot0Backdown to true.

Thanks- Sandeep.

Enthusiast
Hi Carrie,

This is very useful Info. I tried a lot to download "Automated Statistics Management" but failed to do.Request you to send me the document to my Email Id sudheer.0228@gmail.com

Thanks in Advance!!

Teradata Employee

Sandeep,

There are several good reasons to set NoDot0Backdown = TRUE when you get onto 14.10.  Use of extended object names require it, as does use of increased data block and spool block sizes.   You need that setting at TRUE in order to get the advantages of the new statistics histogram Version 6, which is able to record the Update/Delete/Insert counts at the time the stats were collected.   UDI counts are part of what UseCount logging creates, and without UDI counts being integrated in the stats collection process you will risk having a negative impact on the threshold Change Percent capability.  

If you want to use new 14.10 features fully, it's a good idea to set NoDot0Backdown = TRUE.

The only potential down side of this is that you will not be able to back down to 14.0, should you wish to do that.

In terms of the impact on AutoStats, the identification and deactivation of unused stats by Analyze jobs will still work even if NoDot0Backdown is False.  In that scenario, the Analyze job will still read the USECOUNT data that tracks optimizer stats usage.  However, as mentioned above, there will be negative impacts to other features, mostly the change-based THRESHOLD capabilties.

Although it is not the best practice, AutoStats does allow the automation of databases that do not have USECOUNT enabled.  Functionally, Analyze and Collect jobs should still run without failure but those features requiring UDI counts will not operate effectively.

Thanks, -Carrie

Teradata Employee

Thanks a lot carrie :)

Teradata Employee

Hi Carrie,

I have one more doubt. It will be a great help if you provide some input.

I have enabled XMLPLAN and STATUSAGE logging for one account and enable database logging USECOUNT for three databases. We are using PDCR that moves the XMLPLAN data from DBC to PDCR database but it does not move the data for USECOUNT .i.e. DBC.OBJECTUSAGE into PDCR.

So Analyse Job which database I have to mention in “Database containing log tables” field for USE DBQL option in “Edit Evaluation Method Settings” in STATS MANAGER Portal  as XMLPLAN data is in PDCR and USECOUNT data is in DBC.

Thanks-Sandeep.

Teradata Employee

Sandeep,

Analyze Jobs will always read USECOUNT data from DBC.ObjectUsage regardless of what the “Database containing log tables” is set to.

Although it is true that USECOUNT is now enabled using DBQL syntax, from the perspective of AutoStats its underlying storage table does not fall into the category of traditional per-request log tables such as DBC.QryLogV and DBC.QryLogXMLV.

Thanks, -Carrie

Teradata Employee

Thanks a lot Carrie :)

Visitor

Carrie,

We recently enabled USECOUNT on our system and set the DBS Control field NoDot0Backdown to true to make use of version 6 stats. We could see some date sensitive queries that look at current day data generated bad plans. We had collect stats step to refresh stats on date column and recollection didn't happen. We have left all other DBScontrol threshold options to default. We had to use NO THRESHOLD option to force stats collection on those columns. When we checked UDI counts for those tables, we don't see any counts reflected in DBC.ObjectUSage though the tables get loaded on daily basis. We could see that information on few tables in that database. We could not find in what cases it records that info. We could see  information on accesscount but not UDI counts. We are thinking this is causing the issue...Do you have any input on this?

Thanks

Haritha.

 

Teradata Employee

Haritha,

One thing to check.  When you access DBC.ObjectUsage its best to use one of the DBC views for that purpose.  When you access UDI counts for a table, those table-level ObjectUsage rows will have nulls in FieldID and IndexID, so it makes it difficult to join to the other DBC tables to get things like databasename and fieldname.  So if you were accessing the ObjectUsage count table directly, try using one of the views instead, and see if you can see your counts.

Also, make sure your database software is on the most current maintenance release, as there have been a number of fixes related to the usecount feature since it came out.   I believe most of them are in the current software.

Thanks, -Carrie

Teradata Employee

HI Carrie,

We just ran the StatsManager. When we ran it, we found some the tables have "Skipped" state. Is that mean that stats collects skipped thses tables because tables do not have enough change rows?

But we did not set any "user define" setting for stats that helps to skipp the stats collection on % changes.

All the collect Setting (threshold,sampling ) are "Global default".

other question is :

where is this Global default define. ? and what is the default values for "Global default ",can we change it ?

Enthusiast

Hi Carrie,

Do we need to do some clean up before we start a new analyze job in stats manager.

I have been running Analyze jobs for differernt databases, and every time the job runs longer than before.

The last job I ran for our EMEA region databases using PDCRINFO for 3 weeks ran 2.5 days!

I see that we can set Viewpoint to cleanup the TDSTATS tables,, but the default is clea everythin that is 30 days old.  Would that help?  Is it the insert into this table that is the issue?

If yes, should we delete ALL rows (can we do it manually withouth harming anything), to make things faster?

I have an incident opened on this,, but have not heard for few days..  Wondering if you can tell me what causes the analyze job run so long.

We have PDCR setup.  Stats on those tables get collected, as set up by the PDCR process.

We have had USECOUNT loggingfor along time.

I also enabled XMLPLAN and STATSUSAGE on ALL in DBQL about 10 days ago..

Thanks,

Teradata Employee

Sandeep,

There is another blog posting titled Statistics Threshold Functionality 101 that answers your questions.    It provides this explanation:

"System threshold:   This is the default approach for applying thresholds for all 14.10 platforms.  The system threshold default is not a single threshold value.  Rather this default approach determines the appropriate threshold for each statistic and considers how much the underlying table has changed since the last collection."

Basically, you don't set the system default threshold that controls skipping, it is individually determined by the system.

Read that blog for more information.

Thanks, -Carrie

Teradata Employee

Nazy,

The two main factors that influence how long an Analyze job runs are 1.)  the number of logged queries; and 2.)  the average size of the logged XML files.  So the size of the TDStats tables is less critical to performance.

A couple of things that might help you speed up Analyze are eliminating XMLPLAN logging (which is only secondarily important)  and limiting the range of dates upon which analysis is being performed.  Those are good generalized recommendations.

There are a couple of DRs that will further to reduce Analyze time.  As you have an incident open, I'd suggest you work with the GSC to get these fixes.  Sorry, I don't have the DR numbers.

Thanks, -Carrie

Teradata Employee

Hi Carrie,

what is the difference in between

when we automate stats for a database by clicking automate option in statistics tab and when we select the database through analyse job. In both the ways Stats Manager collect stats for this database tables but which one is good.

for Example:

I select one database X in analyse job and job recommands me to approve some stats for this database.

If in statistics tab, i click on database X and select AUTOMATE option , it automate all the stats of this database X. so which approch is good .

I have number of databases .So Do i need to create analyse jobs or i can automate these databases by using AUTOMATE option as explained above.

Please correct me if i am wrong and help me to understand this

Teradata Employee

Sandeep,

Here are some details that describe the implications of using Automate in different ways:

-              The Automate operation registers an existing statistic defined in the data dictionary with AutoStats making it eligible for collection via Collect Jobs. 

-              For a given set of objects, the Automate operation performed via the Statistics Tab (aka, your first approach) is normally done once.  To facilitate the automation of any user defined stats that are subsequently created on those same objects after this initial automation step,  Analyze Jobs offer the Automate existing stats option (aka, your second approach).  

-              The stats involved in both of these automation approaches are almost always user defined, meaning it was the user who submitted the initial COLLECT STATISTICS statement apart from Collect Jobs.

-              The “automation” per se of recommended missing statistics by Analyze Jobs is handled by the user Approval process which is either done explicitly from the Recommendations page of Stats Manager or automatically if the Require review before applying recommendations option is left unchecked. After a missing statistic is approved, it is effectively “automated” although that term is not generally used.  After it’s initial collection is performed by a subsequent Collect Job and its definition stored in the dictionary, the recommended stat could potentially be DeAutomated and  subsequently (Re)Automated just like a user defined stat.

-              Your proposal to rely exclusively on the 2nd aproach to limit automation in certain circumstances (e.g, only the used stats of very large tables) can be done,  but  that was not the intent of this option when it was designed.  If you choose the  “Limit to stats:  That are actually used” sub-option in your Analyze Job,  you will only automate those stats that were found in analyzed logged STATSUSAGE data.  STATSUSAGE output can tell you if a stat was used, but it cannot tell you if it was unused or the frequency of use.  By relying primarily on this approach you may end up not automating an important used stat.

-              As currently designed, Analyze Job option “Evaluate stats to determine if they should be collected or deactivated option”  (which relies on USECOUNT data) will only recommend inactivation of unused stats that have been automated.  Put another way, you have to automate a stat before an Analyze Job will tell you it’s not used.    

-              The scope of  Collect Jobs is limited to only automated stats (existing or approved missing) for the user specified objects . Apart from the cleanup/deactivation function mentioned in the point above, Analyze Jobs have no such limitation and will recommend missing stats on objects that have no automated stats.  

Thanks, -Carrie

Teradata Employee

Thanks Carrie :)

we are facing one issue due to OBJECT logging( OUC). One fastload is in abort state more than 18 hours. We found on knowlege article KAP316902A. The same has been raised to GSC.

Is there any solution to end this abort state session?

Teradata Employee

Sandeep,

I don't have anything more to add than what is in the knowledge article. 

Once you have opened an incident on a problem with the GSC, it is best to continue to work with the support people for a resolution to the problem.   Please raise your question you have about solutions directly with them.  

Thanks, -Carrie

Teradata Employee

thanks Carrie

Teradata Employee

Hi Carrie,

I have doubt regarding BLC and MVC tables.What will be the stats manager behavior (CPU,IO etc) for MVC and BLC tables?

will the stats proccess will be slow?

Teradata Employee

You mentioned both multivalue compression.   There is no overhead when reading a table that contains multivalue compression.

Please refer to the Database Administration Manual if you need help in understanding how multivalue compression works.  Page 183 of the 14.10 version (B035-1093-112A) says:

==========================

Using Multivalue Compression

MVC compresses repeating values in a column when you specify the value in a compression list in the column definition.

When data in the column matches a value specified in the compression list, the database stores the value only once in the table header, regardless of how many times it occurs as a field value for the column. The database then stores a smaller substitute value, often as small as 2 bits, in each row where the value occurs.

MVC generally provides the best cost/benefit ratio compared to other methods. Because it requires minimal resources to uncompress the data during query processing, you can use MVC for hot (frequently-used) data without compromising query/load performance. MVC is also considered the easiest to implement of all the compression methods.

=============================

Stats Manager does not read base tables for the Analyze jobs.  It reads DBQL log tables and other dictionary tables.  So if only the non-dictionary base tables are compressed with block level compression, I don't believe there will be any impact on what Stats Manager is doing within Analyze Jobs.  

But Collect jobs could run somewhat longer due to the extra CPU to decompress data blocks in order to collect statistics against base tables.   In our testing back in 13.10, we found elapsed time was from 3% to 9% longer with BLC on the table, compared to no BLC. That was on an EDW platform.

If you are on an appliance where everything is compressed automatically, the overhead of Stats Manager due to compression will be equivalent to the overhead of doing anything else in the system.  Because everything is compressed, there would be no baseline of comparison with no BLC, so the same overhead that is normally experienced in accessing base table data due to block-level compression will be present with the Collect Jobs.

Thanks, -Carrie

Teradata Employee

Thnks Carrie :)

When Analyze job runs. it give recommendations in 6 tabs (Missing,Usage,stale,Deactive,Reactive,Automate).

when i got recommendation from Analyze job i approved all the recommentions in Statistics tab and collecting job is running fine on approve stats.

But i found one table in Missing Tab (Analyze job) is showing stats recommendation with High system Importance but when i check my Collect job, this table is NOT part of collect stats job even i have approved all the recommendation suggestion in Statsistics tab.

 How can i approve for missing stats. Am I missing somthing ?

Please help.

Thanks in Advance. 

Teradata Employee

Sandeep,

I suggest you double-check the following in your Collect Job definition:

1. Verify the Scope of the Collect Job includes the table containing the approved missing stat

2. Verify  the Collect List setting for that Collect Job is set to Automatically generate the collect list

3.  Under the Collect List tab, choose Preview List and drill down on the displayed collections to see if there is one for the  approved missing stat in question

Thanks, -Carrie

Teradata Employee

Hi Carrie,

I checked the stats job and found some information

In Analyze Job under Missing Tab, there is one column "Recommendation" which has two values Yes/No only.

I approved all the recommendation but it seems that only those stats come under recommendations which has "Recommendation" column value as YES in Analyze job under Missing tab.

The table is showing in analyze job with "Recommendation" Column as NO even system importance is HIGH.  Is there any way to Mark this Values as YES or to automate the stats for these tables.

Teradata Employee

Sandeep,

You can only a approve a missing stat for collection if it is recommended by the analyze job.   Basically only recommendations that made the TOP-N final cut thru the Stats Manager portlet are available to be approved.  

You could consider increasing the Max_Percent_Missing configuration setting as documented in Chapter 19 of the orange book.  That will increase the N in the TOP-N mentioned above.    

Thanks, -Carrie

Teradata Employee

Thanks a lot Carrie :)

Is there any house keeping required for this Objectusage table. That means any maintenance required.? Is there any over head on resource usage if we enable OUC.?

N/A

Hi Carrie.

Every thing is working fine with us and AutoStats is fully operationnal.

However, i've tried to edit DBQL settings in my Analyze job so it would limit to querries from a few account strings. This doesn't seem to be working. Indeed when i specify one account string every thing is going well but if i specify more than one account string the Analyze job runs for a very short time and returns nothing.

Do you know which syntax i have to use in this filed to specify many account strings?

Thanks in advance, Yoann

Teradata Employee

Response to Sravani:

There is no required maintenance involved in the ObjectUsage table.  

 I have heard any complaints about the overhead involved in enabling USECOUNT logging.  For most sites, there is just the usual very minor DBQL overhead.  The caching and writing to disk is optimized for USECOUNT logging just the same as for other types of DBQL logging.

Thanks, -Carrie

Teradata Employee

Yoann,

The intent of the AcctString input parameter is to be a single account string, although it does support wildcarding.

The account string input parameter behaves the same in AutoStats as the other input parameters for DatabaseName, TableName, ApplName and UserName.  If wildcards are not an option,  you would have to create a separate Analyze job for each account string.

Thanks, -Carrie

Enthusiast

Hi Carrie,

I was perusing the AnalyzerHistoryTbl and while I can figure out what happens with most of the Event Types, I am wondering what is the purpose of the rows with an EventType = 'u'.

Does that means that statistics were detected on those columns and no further action is necessary on the automated stats but that the non-automated detected stats SHOULD be automated? 

Enthusiast

Actually, is there a document that gives the expected values and their usage interpretations for the StatsTbl and the AnalyzerHistoryTbl?

Teradata Employee

An EventType of ‘U’ indicates an existing collected statistic was observed as being Used during query optimization as recorded by the DBQL STATSUSAGE logging option.  When related column NeedsAutomation is set to ‘Y’,  the used statistic has not yet been automated and thus is not considered by AutoStats Collect Jobs.  Stats falling into this category are reported as Automate recommendations by Analyze Jobs as described in section 6.1 of the AutoStats Orange Book.

In terms of your second question:  There is no documentation that carries expected values and their interpretation for the StatsTbl and the AnalyzerHistoryTbl.  Although the TDStats APIs (stored procedure definitions) are documented in the user manuals, the TDStats table schema is not.

Detailed knowledge of the underlying TDStats repository has not been considered to be necessary for those users who use the Stats Manager portlet.

Thanks, -Carrie

Enthusiast

Thank you!

Teradata Employee

Hi Carrie,

Recently, we swap a system from TD 13.10 to 15.00 and we are collecting stats on CDR table which have BLC and unable to collect stats on the table. It taking long time and we can't say when this will complete. The same stast on the same table on 6650 and TD 13.10 collect in 5 minutes.

Any suggest?

Many thanks in advance.

Teradata Employee

Lots of things change when you go from 13.10 all the way to 15.0.  So there is no simple answer to any question involving a change in performance.

You did not say if the CDR data was uncompressed in 13.10 and then compressed only after you went to 15.0.  If that is the case (uncompressed before upgrade, compressed after), then you can expect more CPU to be required to collect statistics.   When data blocks of a table are compressed, they must be uncompressed before statistics can be collected.  That could cause the job to run longer.

Make sure you are collecting statistics with the new 14.0 sytax, where you bundle all the individual column/index collections into one collect statement, that way you will reduce the table scan effort, and therefore the decompression effort.

Check the actual CPU being used for the statistics collection before and after the upgrade.   It may be  somewhat greater post-upgrade, but should not be orders of magnitude greater.   If CPU usage to collect an individual statistic is in the same ballpark (maybe 10% or so greater than it was in 13.10, or if using BLC for the first time, maybe 30% to 50% greater), then that seems reasonable to me.  

If the CPU reported for the stats collection is acceptable, other things you could check are if the stats are running at a different (lower) priority, or is there more active work running on the platform leading to more contention for resources.

Thanks, -Carrie

Fan

Hi Carrie

We upgraded from TD 13.10 to TD 14.10. I am trying to implement automated stats using your excellent blog. What other TD 14.10 features that can be implemented that you would recommend?

thanks

Teradata Employee

Unfortunately, I am not able to give you specific recommendations on what to implement because I am not familiar with your site.  And I don't have first hand experience with most of enhancements and features new features in 14.10.   If you want a list of the new features contained within the 14.10 release, the best source of information is the Teradata Database Release Summary for 14.10, available at:

http://www.info.teradata.com/

Whether any of the new features will be of value to your site will depend on the type of processing that is going on on the platform and the business needs of the end users.  Check with your Teradata account team or Teradata Professional Services representative for guidance and around use of specific features.

Bet regards, -Carrie

Hi Carrie

We recently started using statsmanager portlet and for DBQL logging what are all options required to start logging for the candidate databases we choose for ANALYZE job.

We want to start with STATSUSAGE to evaluate stats recommendations for finding missing and used stats.

Is SQL, OBJECTS ,STEPINFO , EXPLAIN also required in addition to STATSUSAGE logging on database we choose to ANALYZE. I have read the article and mentioned that OBJECTS,SQL are not used by autostats.

Please advise the best practices on DBQL looging for ANALYZE job.

Thank you for the valuable information on the autostats article. It helped in understanding the autostats fetaure.

Regards

Chandra

Teradata Employee

Chandra,

At a mininum, you need STATSUSAGE logging enabled for an Analyze Job to making recommendations regarding missing and used stats.  Enabling the XMLPLAN option as well may result in better quality of recommendations as it provides additional plan data regarding the accuracy of Optimizer spool size estimates.  However, there is more overhead with XMLPLAN.    Analyze Jobs will not use any of the other mentioned logging options including STEPINFO.

For more info,  see Chapter 13 of the AutoStats Orange Book.

Thanks, -Carrie

Thank you very much Carrie.

Hi Carrie,

I have created three new workloads for the autostats TIWM implementation. We are appliance platform.

Three workloads VP-STATS-L, VP-STATS-M and VP-STATS-H were created

1) VP-STATS-L 

User classification and classify on queryband

vpStatsJobType=COLLECT for the collect jobs and  vpStatsJobType=ANALYZE for the analyze jobs

2) VP-STATS-M

using the same classification as the second workload above, but add classification on statement type of COLLECT STATISTICS and vpStatsJobType=COLLECT

3) VP-STATS-H

Request Source classification on this user

When i am running ANALYZE job , it is classifying to VP-STATS-H workload instead of VP-STATS-L

I gave evaluation order like below

VP-STATS-L

VP-STATS-M

VP-STATS-H

Can you please advise if i am missing anything or doing correctly on the classification and evaluation order

as per your recommendations. Thank you in advance.

-Chandra

For the evaluation i gave below order  VP-STATS-M, VP-STATS-L and VP-STATS-H. Thanks.