If You’re Not Collecting Statistics on Your Dictionary Tables, Do It. Now.

Blog
The best minds from Teradata, our partners, and customers blog about whatever takes their fancy.
Teradata Employee

I’ve mentioned it before, Marcio has blogged about it, customers have brought it up at the Partners Conferences. It’s cheap, fast, risk-free, with immediate results. But some of you are still not getting it. Or it could be you’re one of the few who truly don’t need it.

Either way, I’m going to take this opportunity to pound away a little more on the advantages of collecting statistics on your dictionary tables.

I’m not talking about full collection on all columns of all dictionary tables. For one thing, only the hashed dictionary tables let you do collections. Just to remind you, the non- hashed tables that do not support statistics collection include:

  • DBC.Acctg                                     - Resource usage by Account/User
  • DBC.ChangedRowJournal             - Down-AMP recovery journal
  • DBC.DatabaseSpace                     - Database and table space accounting
  • DBC.LocalSessionStatusTable       - Last request status by AMP
  • DBC.LocalTransactionStatusTable - Last transaction consensus status
  • DBC.OrdSysChngTable                  - AMP recovery journal
  • DBC.RecoveryLockTable                 - Recovery session locks
  • DBC.RecoveryPJTable                    - Permanent journal recovery
  • DBC.SavedTransactionStatus        - AMP recovery table

     

You only need to consider collecting on columns where you think it could speed up dictionary access for queries that you often run. Dictionary tables where collected statistics are often useful include:

  • DBC.TVM
  • DBC.DBase
  • DBC.TVFields
  • DBC.AccessRights
  • DBC.Indexes
  • DBC.Profiles
  • DBC.Owners
  • DBC.Roles
  • DBC.RoleGrants
  • DBC.UDFInfo

An important thing to remember is that dictionary tables are typically very small compared to your usual production tables. Teradata DBAs have told me that their usual statistics collection time for the dictionary tables is in the 1 minute to 3 minute range. Most of you can afford that.

At one Teradata site, DBQL data showed them that 3K to 5K queries per day were running against their DBC tables. Most of these dictionary queries were coming from their own home-grown applications and from PMON and Teradata Manager. They discovered that MS-Access, based on how they were using it, appeared to be getting CurrentPerm values from the DBC.TableSize view frequently, apparently to check the size of the data before deciding to bring it back from Teradata.

Some of the more complex views that come with your system may run better when stats have been collected on the underlying base tables. At one site a simple query against the UserRoleRights view in DBC used 40 GB of spool and took several minutes to complete. After they collected stats it used 4 GB of spool and came back in seconds. Sound familiar?

Here’s a listing from yet another Teradata site, showing which dictionary table statistics they collect statistics on. Customize your dictionary collections based on what you actually need. DBQL can be your good friend when it comes to better understanding of what your dictionary access actually looks like. You might be surprised!

COLLECT STATISTICS DBC.udfinfo column ( DatabaseId ,FunctionName );
COLLECT STATISTICS DBC.udfinfo column DatabaseId ;
COLLECT STATISTICS DBC.udfinfo index (FunctionID );
COLLECT STATISTICS DBC.tvm COLUMN (DATABASEID ,TVMNAMEI);
COLLECT STATISTICS DBC.tvm COLUMN CREATEUID;
COLLECT STATISTICS DBC.tvm COLUMN LASTALTERUID;
COLLECT STATISTICS DBC.tvm COLUMN TABLEKIND;
COLLECT STATISTICS DBC.tvm COLUMN DATABASEID;
COLLECT STATISTICS DBC.tvm COLUMN TVMID;
COLLECT STATISTICS DBC.tvm COLUMN TVMNAME;
COLLECT STATISTICS DBC.Dbase COLUMN DATABASENAMEI;
COLLECT STATISTICS DBC.Dbase COLUMN DATABASEID;
COLLECT STATISTICS DBC.DBase COLUMN OWNERNAME;
COLLECT STATISTICS DBC.DBase COLUMN LASTALTERUID;
COLLECT STATISTICS DBC.DBase COLUMN (DATABASENAME ,LASTALTERUID);
COLLECT STATISTICS DBC.DBase COLUMN DATABASENAME;
COLLECT STATISTICS DBC.ACCESSRIGHTS COLUMN (USERID ,DATABASEID);
COLLECT STATISTICS DBC.ACCESSRIGHTS COLUMN (TVMID ,DATABASEID,USERID);
COLLECT STATISTICS DBC.ACCESSRIGHTS COLUMN DATABASEID;
COLLECT STATISTICS DBC.ACCESSRIGHTS COLUMN (TVMID ,DATABASEID);
COLLECT STATISTICS DBC.ACCESSRIGHTS COLUMN (TVMID ,USERID);
COLLECT STATISTICS DBC.ACCESSRIGHTS COLUMN TVMID;
COLLECT STATISTICS DBC.ACCESSRIGHTS COLUMN USERID;
COLLECT STATISTICS DBC.ACCESSRIGHTS COLUMN (AccessRight,GrantorID);
COLLECT STATISTICS DBC.ACCESSRIGHTS COLUMN (AccessRight,CreateUID);
COLLECT STATISTICS DBC.ACCESSRIGHTS COLUMN (FieldId,AccessRight);
COLLECT STATISTICS DBC.ACCESSRIGHTS COLUMN FieldId;
COLLECT STATISTICS DBC.ACCESSRIGHTS COLUMN AccessRight;
COLLECT STATISTICS DBC.ACCESSRIGHTS COLUMN GrantorID;
COLLECT STATISTICS DBC.ACCESSRIGHTS COLUMN CreateUID;
COLLECT STATISTICS DBC.ACCESSRIGHTS COLUMN (DatabaseId,AccessRight);
COLLECT STATISTICS DBC.ACCESSRIGHTS COLUMN (TVMId,AccessRight);
COLLECT STATISTICS DBC.OWNERS COLUMN OWNERID;
COLLECT STATISTICS DBC.OWNERS COLUMN OWNEEID;
COLLECT STATISTICS DBC.OWNERS COLUMN (OWNEEID ,OWNERID);
COLLECT STATISTICS DBC.ROLES COLUMN ROLENAMEI;
COLLECT STATISTICS DBC.ROLES COLUMN ROLEID;
COLLECT STATISTICS DBC.RoleGrants COLUMN GRANTEEID;
COLLECT STATISTICS DBC.HOSTS COLUMN LogicalHostId;
COLLECT STATISTICS DBC.HOSTS INDEX ( HostName );
COLLECT STATISTICS DBC.TVFIELDS COLUMN TableId;
COLLECT STATISTICS DBC.TVFIELDS INDEX ( TableId ,FieldId );
22 Comments
Enthusiast
Hello Carrie,
This is really nice and helpful information. Thanks for this information.

I have below question related to this.
1. Do we need to expect any blocking while collecting statistics on really busy system?
2. What would be your suggestion for frequency for collecting stats on these suggested dictionary tables?

Thanks & Regards,
Shrinivas Sagare
Teradata Employee
Hi Shrinivas,

If you look at the explain of a collect statistics statement, the database applies an access lock on the table on which statistics are being collected. This is to minimize any blocking. The same access locking approach will apply to dictionary tables as well.

The only lock that an access lock cannot read through is an exclusive lock. However DML/DDL statements do not cause exclusive locks to be set on dictionary tables (only on the database objects being operated on). The only dictionary-related activity that could set an exclusive lock that I am aware of is a traditional dictionary archive, which does place a table level exclusive lock on dictionary tables during the archive.

When it comes to the writing of the histogram to the TVM or Indexes tables, that requires only a row hash level write lock, which minimizes possible contention.

I have not heard contention raised as an issue from other sites who collect dictionary statistics regularly. I know of sites that recollect dictionary statistics nightly. Since in most cases this effort takes a couple or few minutes of time, there is little downside to doing it often if it is providing value. On the other hand, if your applications and environment are not undergoing much change that would impact the dictionary tables you chose to collect stats on, then less frequent recollections are probably fine, weekly or every other week. It really will depend on the volatility of your dictionary tables.
Enthusiast
Hi Carrie,

Are there any benchmark queries you suggest we run before collecting statistics on the suggested tables in order to determine how the process would benefit us?
Teradata Employee
The advantage you will get from collecting stats on the dictionary tables will depend on whether you are submitting SQL of any complexity against those DBC tables. Such SQL could be coming from 3rd party tools, or from your users. Or you may not be experiencing that at all.

One approach is too look at your DBQL data for SQL containing 'DBC' and then assess the complexity and the frequency of such requests. At one site I know of, they pulled queries from DBQL that were accessing DBC tables and ran an explain on them with the DIAGNOSTIC HELPSTATS command to see what stats the optimizer recommended, and used that as a starting point.

Using those DBQL queries, you can assess query execution time (and CPU consumption) before and after stats are collected to see the difference for yourself, then make a decision.

Thanks, - Carrie
Teradata Employee
Carrie,

User 'DBC' can collect these statistics. Is there a way to allow another user to collect stats on DBC tables?

(We tried granting INDEX on DBC tables to another user but that fails. At the site where I am working, routine use of the DBC password is not permitted.)

Regards - Tony
Teradata Employee
Hi Tony.

Good question.

A new access right called "STATISTICS" was introduced in Teradata 13.0. This right can be granted to other users on DBC tables which enables them to collect stats.
Enthusiast
Hello Carrie,
Is there a cut off point when collecting statistics on DBC is not a good idea.
I have worked at sites where users have the ability to create/drop tables. On a daily basis you can see hundreds/thousands of tables being created dropped - this will have a big impact on data held within the dictionary and devalue the benefit of the stats.
Teradata Employee
Hi Ian, Whether or not stats will help on your dictionary tables will depend on whether doing so improves your query plans, or doesn’t.

If you're users are not doing complex queries against dictionary tables, then you're not going to get see much value. And if they are, then I agree that it's worth evaluating the approach.

I'm not sure what the impact of a high numbers of drops/creates of objects would be on that decision. If you collect stats nightly, as some Teradata sites do against the DBC tables, you may be able to keep up in general with the level of change. And if the DBC tables involved when you drop/create a table (TVM, TVFields, etc.) are not the tables being used in end-user dictionary queries, you could simply not collect on those few if they seem too volatile.

But to do this right, what you'd need to do is look at DBQL for the SQL being used against the dictionary tables, and see if those queries are meaningful consumers of resources or not. If you find some such queries that are, then consider collecting stats on just the particular tables involved.

In general, I think the tradeoffs of over-collecting stats are less severe with the DBC tables because they tend to be on the smaller side compared to most user tables, and much much faster and cheaper to collect on. Plus, end-user access to the DBC tables is often very light and less critical time-wise. The database relies mainly on express requests to get to the dictionary, so it won't use the stats anyway or be misled by stale stats.

So, yes, there are tradeoffs, but you can minimize the downsize of over-collecting by checking DBQL and examining the types of queries that access the dictionary tables in your environment. Then decide.
hi.this is naresh
i am applying write lock on emp table.
but i am logging with different user,but inserting the row to emp table through second user.
iknow the concept once apply write lock no user can insert the date.

the question is how to see pratically?
Teradata Employee
Naresh,

You can view locking conflicts by means us two different utilities: Lock Disp and/or Locking Logger. Both are described in the Utilities Volume 2 manual, G- S.

Thanks, -Carrie
Hello Carrie,

What I have learnt for TD13 is, Teradata engine is smarter (then ever before) with respect to Stats.

Other words, it’s better to have ‘no stats’ rather then having stale / confusing stats (for the engine).

However agree with your point of analysing DBC access from DBQL, before implementing this performance improvement solution. But even after that, I doubt eventually it might end up with something unexpected.

Definitely, it would help, but requires continuous & active monitoring.

This is just my point of view.
Enthusiast
Hello Carrie,
I have 2 questions for you -
1> How frequently we should collect stats?
2> No Stats are better than bad stats. For example - I collect stats 6 times a day on DBC tables. Still it will be old after sometime. Is the old stats going to affect the DBC tables, because some tables like AccessRights etc are used extensively.

Thanks in advance.
Teradata Employee
Somnath,

Take a look at an article I co-authored that is also here on DevX titled Statistics Collection Recommendations for Teradata 12. That might help you getter sense of some good practices around stats collection.

In general, stats need to be re-collected only if the table has changed. If the table is static, or only changed a little, you do not need to think about recollecting stats. The rule of thumb is 10% change to a table, or 10% change to a partition in the case of a PPI table.

With DBC tables, the degree of change will depend on the frequency of DDL that is issued that results in changes to dictionary tables. Six times a day to recollect stats on DBC tables seems a little excessive to me, but it probably is not taking very long at all. And may be your users are issuing a lot of DDL. I would ask myself, has my AccessRights table has changed 10% of its rows every 3-4 hours, or 6 times a day? That's 60% of the rows changing per day, more or less. You might want to re-evaluate that process, but as I said, it won't hurt to over-collect and it is probably not using much resource to keep on doing what you are doing.
Enthusiast
Hi Carrie
I have started to collect stats on my customers system, and analysis of DBQL queries advised stats on DBC.DATABASESPACE. This is one of the tables you said can't be collected on because it's a non-hashed table, however I'm able to collect stats on it, just thought I'd mention it
Cheers
Steve
Teradata Employee
Hi Steven,

Your a step ahead of me! While it was true in the past that you could not collect statistics on un-hashed tables in the data dictionary, that restriction was lifted in release 13.0 (I believe). I'm guessing you are on that release or a higher release.

Thank you so much for sharing that info with this forum. Might help someone else.

Regards, - Carrie
Teradata Employee
Thanks Carrie for the Valuable Suggestion. I would like to know 1 thing here:
When you mentioned the Non-Hashed Tables, the picture says "New Row goes to the AMP from which it was produced". How is this possible ? A new row is provided by User, then how goes to the AMP which produced it.

How Data Distribution takes place for Non-Hashed Tables? Does a few AMPS gets all the burden ?
Teradata Employee
The unhashed tables in the data dictionary contain AMP-local data. Either the rows are related to an activity on that AMP or there is a complete set of rows on each AMP.

For example, the TransientJournal table has a data row on a given AMP for each active operation that impacted data on that same AMP. Another example is the Acctg table, which records resource usage on an AMP that took place on that same AMP. Such AMP-local data is more efficiently stored on the table on which it was produced, eliminating the need for row redistributions for either storage of new rows or retrieval of rows already stored.

For the tables that might grow large, for example the Acctg table, there is no concern about uneven data distribution because each AMP will be storing a similar row for each measurement interval, but only for the resource usage that took place on that AMP. The only time there might be uneven distribution of rows in the Acctg table might be if all the requests supported on the Teradata were single AMP and they all went to a small number of AMPs. But with all-AMP operations, the Acctg table will support the same number of rows on each AMP for each user and account.

This use of non-hashed tables is designed to support time critical operations necessary for DBS functioning such as lock management, rollback, and recovery.

Thanks, -Carrie

I want to request  for collect stat to the owner of the table.Can you please tell me the sql statement for the same in TERADATA?

Teradata Employee

Prior to Teradata 14.0, a user that is not DBC must have INDEX or DROP privileges on an object in order to collect statistics on it.  So you would need to issue a GRANT statement giving the user those privileges.

As of 14.0, you can use a new STATISTICS privilege that is limited to statistics only.  This privilege can be granted at either the table or the database level. 

Thanks, -Carrie

Hi Carrie

I have this situation where my team writes views (for simplicity, we call them reusable views) which often uses joins based on ID fields and current indicators.

Then comes the consumer views which are built on top of these reusable views. Again consumers build their logical data extraction on top of consumer views for their dashboarding and charting purposes.

when I see the explain plan of our consumer views or reusable views, I see optimiser uses multiple levels of joins on current indicators. But these indicators are hardly containing 'y' or 'n' values. They mostly indicate product or merge joins. When we we pass this for diagnostic stats, we see the recommendation for collecting stats on these indicator columns too.

My question is - Is it worth collecting stats on tables for such indicator columns which ranges from few MBs to few GBs? We already have collected stats on joining ID fields which is also a NUPI.

Another question -Why does optimiser build merge or product join step based on indicator fields when we nest reusable views into consumer views?

Thanks,

Lakshminarasu

Teradata Employee

Lakshminiarasu,

When you use the diagnostic help stats command, you may get more suggestions for statistics collections than you actually need.   Unfortunately,  it's difficult to know if the columns it is suggesting you collect on will make a difference to query plans or not.  Especially if you already following the statistic collection guidelines that are provided here on Developer Exchange. 

Often single column suggestions that show up in the output of the diagnostic command are more relevant than longer multi-column statistics.    You can try collecting statistics on the recommended columns and see if you get better plans.   If not, then don't collect them going forward.     

I am not able to provide a response to your question about why you get merge or product joins in your explains.   The optimizer apparently believes that is the best choice based on the available information.  You could try to collect statistics more fully on the involved tables, if there are some recommended statistics that are not currently being collected  

If you believe there is a something wrong happening when you nest views, like a deficiency in what the optimizer is doing,  then please open an incident with the support center.

Thanks, -Carrie

Thanks very much,  Carrie!

I will analyse and do some amendments to the existing stats plan.

Regards

Lakshminarasu