Recommended Dictionary Statistics to Collect in Teradata 14.0

Blog
The best minds from Teradata, our partners, and customers blog about whatever takes their fancy.
Teradata Employee

Collecting statistics on data dictionary tables is an excellent way to tune long-running queries that access multi-table dictionary views.  Third party tools often access the data dictionary several times, primarily using the X views.  SAS, for example, accesses DBC views including IndicesX and TablesX for metadata discovery.  Without statistics, the optimizer may do a poor job in building plans for these complex views, some of which are composed of over 200 lines of code.

In an earlier blog posting I discussed the value of collecting statistics against data dictionary tables, and provided some suggestions about how you can use DBQL to determine which tables and which columns to include.  Go back and review that posting.  This posting is a more comprehensive list of DBC statistics that is updated to include those recommended with JDBC.

Note that the syntax I am using is the new create-index-like syntax available in Teradata 14.0.  If you are on a release prior to 14.0 you will need to rewrite the following statements so they are in the traditional collect statistics SQL.

Here are the recommendations for DBC statistics collection.  Please add a comment if I have overlooked any other useful ones.

COLLECT STATISTICS
 COLUMN TvmId
 , COLUMN UserId
 , COLUMN DatabaseId
 , COLUMN FieldId
 , COLUMN AccessRight
 , COLUMN GrantorID
 , COLUMN CreateUID
 , COLUMN (UserId ,DatabaseId)
 , COLUMN (TVMId ,DatabaseId)
 , COLUMN (TVMId ,UserId)
 , COLUMN (DatabaseId,AccessRight)
 , COLUMN (TVMId,AccessRight)
 , COLUMN (FieldId,AccessRight)
 , COLUMN (AccessRight,CreateUID)
 , COLUMN (AccessRight,GrantorID)
 , COLUMN (TVMId ,DatabaseId,UserId)
ON DBC.AccessRights;

COLLECT STATISTICS
 COLUMN DatabaseId
 , COLUMN DatabaseName
 , COLUMN DatabaseNameI
 , COLUMN OwnerName
 ,  COLUMN LastAlterUID
 , COLUMN JournalId
 , COLUMN (DatabaseName,LastAlterUID)
ON DBC.Dbase;

COLLECT STATISTICS
 COLUMN LogicalHostId
 , INDEX ( HostName )
ON DBC.Hosts;

COLLECT STATISTICS
 COLUMN OWNERID
 , COLUMN OWNEEID
 , COLUMN (OWNEEID ,OWNERID)
ON DBC.Owners;

COLLECT STATISTICS
 COLUMN ROLEID
 , COLUMN ROLENAMEI
ON DBC.Roles;

COLLECT STATISTICS
INDEX (GranteeId)
ON DBC.RoleGrants;

COLLECT STATISTICS
COLUMN (TableId)
, COLUMN (FieldId)
, COLUMN (FieldName)
, COLUMN (FieldType)
, COLUMN (DatabaseId)
, COLUMN (CreateUID)
, COLUMN (LastAlterUID)
, COLUMN (UDTName)
, COLUMN (TableId, FieldName)
ON DBC.TVFields;

COLLECT STATISTICS
 COLUMN TVMID
 , COLUMN TVMNAME
 , COLUMN TVMNameI
 , COLUMN DATABASEID
 , COLUMN TABLEKIND
 , COLUMN CREATEUID
 , COLUMN CreatorName
 , COLUMN LASTALTERUID
 , COLUMN CommitOpt
 , COLUMN (DatabaseId, TVMName)
 , COLUMN (DATABASEID ,TVMNAMEI)
ON DBC.TVM;


COLLECT STATISTICS
 INDEX (TableId)
 , COLUMN (FieldId)
 , COLUMN (IndexNumber)
 , COLUMN (IndexType)
 , COLUMN (UniqueFlag)
 , COLUMN (CreateUID)
 , COLUMN (LastAlterUID)
 , COLUMN (TableId, DatabaseId)
 , COLUMN (TableId, FieldId)
 , COLUMN (UniqueFlag, FieldId)
 , COLUMN (UniqueFlag, CreateUID)
 , COLUMN (UniqueFlag, LastAlterUID)
 , COLUMN (TableId, IndexNumber, DatabaseId)
ON DBC.Indexes;

COLLECT STATISTICS
 COLUMN (IndexNumber)
 , COLUMN (StatsType)
ON DBC.StatsTbl;

COLLECT STATISTICS
 COLUMN (ObjectId)   
 , COLUMN (FieldId)
 , COLUMN (IndexNumber)
 , COLUMN (DatabaseId, ObjectId, IndexNumber)
ON DBC.ObjectUsage;

COLLECT STATISTICS
 INDEX (FunctionID )
 , COLUMN DatabaseId
 , COLUMN ( DatabaseId ,FunctionName )
ON DBC.UDFInfo;

COLLECT STATISTICS
 COLUMN (TypeName)
 , COLUMN (TypeKind)
ON DBC.UDTInfo;
32 Comments
Enthusiast
Hello Carrie

The DBC tables are usually point in time tables. More often,these data are moved into corresponding PDCR tables after say, 07 days. Also,DBAs use the PDCR tables for Analysis of certain events.

So, the Stats collection process should also be followed for PDCR tables.
Teradata Employee

Statistics collection on the DBC tables is important because those tables are being used by third party tools with complex SQL (X-views) and the queries are often taking a long time.  Those 3rd party tool SQL are mostly looking at things like access rights and roles and grants and UDFs, things that would NOT typically be rolled off onto PDCR tables.

But you are correct that stats need to be collected for PDCR tables as well.  The PDCR package does come with collect stats statements of its own.  It's a different list from the stats above.   I believe the file where most of the collect stats are defined is “InstallPDCR_Tables.sql”.  

I think the package recommends weekly recollections on its entire list of collect stats.  But there are also stats collected daily on “column Partition” and “column LogDate” stats.  The PPI is on the LogDate column.

Check with a Teradata PS person if you have additional questions about PDCR conventions.

Thanks, -Carrie

Enthusiast

Hi,

I need a query in Teradata to check what field/indexes/object were accessed/referenced in last one week (for example) for how many times.

Could anyone help in building this query ?

Thanks.

Teradata Employee

Sorry, but I don't have such a query, but it's likely someone out there does.

It would be best to post this kind of question on Teradata Forum, where you can reach a much broader audience. 

Thanks, -Carrie

Enthusiast

Ohh. That's right. I should have posted this on Teradata forum. Let me do that. Thanks.

N/A

Hi Carrie,

Sorry, I don't know where I should post this question, so, I just post at here. 

I am Informatica Developer and a new person for Teradata. I just want to know how many utility (MLOAD, FLOAD, TPUMP, TPT) jobs can run in parallel? The teradata I am using is  VERSION 13.10.04.07 RELEASE 13.10.04.07

Thanks in advance.


Teradata Employee

Just for future refence, the best place to post a general question such as this is on the Teradata Forum.  That will give the question exposure to the broadest audience.  In addition, you can do searches on the Teradata Forum site which might provide answers to future questions that come up for you:

http://forums.teradata.com/forum

If you are new to Teradata and would like more detailed information about Teradata features and functions, the official manuals can be found at:

http://www.info.teradata.com/

There is an internal database rule that limits the number of FastLoad (TPT Load Operater) and/or MultiLoad (TPT Update Operator) jobs combined to 30.   Another rule limits that same combination but also including FastExport to 60 jobs at a time.

If you are not using system throttles or utility throttles and TASM is not active, then DBS Control parameters determine the number of load jobs you can run concurrently.  The default is 15, but it can be set to go up as high as 30 for FastLoad and MultiLoad combined.

If you are using utility throttles and/or TASM, then the DBS Control parameters will not have any impact. 

There is no limit to Tpump (TPT Stream operator) jobs, as Tpump is considered standard SQL, not a load utility.

Thanks, -Carrie

Enthusiast

Hi Carrie

When it says "The DBC tables are usually point in time tables", what does it mean ? Can you please give some examples of such dbc tables ?

Thanking You

Santanu

Teradata Employee

Santanu,

I don't recall reading that comment, or making that comment, so I'm not sure I can help answer your question.  I re-read this posting, but didn't see any reference such as the one you are quoting.   

Thanks, -Carrie

Enthusiast

Hi Carrie

Sorry, it's my bad. Actually in the discussion part Smarak mentioned something as DBC tables are all point in time tables and data are moved to PDCR tables. There was even a discussion on PDCR table stats collection.

I am now learing about these administration related objects and activities. That is why I wanted to clarify what is a PDCR table? What does it mean when it say DBC tables are usually point in time tables? How a third party tool relates to a PDCR table?

These questions may sound very basic, but came into my mind.

Thanking You

Santanu

Teradata Employee

Santanu,

I am not someone who works with PDCR or can explain detail about PDCR to you.   Best to ask that question to people from the Teradata Professional Services organization or post it on the Teradata Forum.  

My light understanding is that the PDCR database is a set of tables where you can offload some of the monitoring DBC tables (like DBQL or Resusge) on a regular basis.    That would leave only the most current information in the DBC tables themselves.

Thanks, -Carrie

Enthusiast

Thanks Carrie for your reply.

Thanking You

Santanu

Visitor

Hi Carrie,

Thank you for documenting these. Would these be the same statistics that would be recommended for collection on v14.10, or would there be some different recommendations for 14.10?

Teradata Employee

There are no new recommendations for 14.10 in terms of which dictionary statistics to collect, so I would just continue to collect stats on these dictionary tables, and if you find additional dictionary tables are being accessed frequently with joins to other tables, and you are not getting good plans or they are talking a long time, include them on your stats collection list as needed.

Actually, I wouldn't expect many changes in this area for 14.10, since engineering only updates (or adds) dictionary tables in major releases, like 14.0.    So this list is probably good until 15.0.

Thanks, - Carrie

ABOUT BLOB AND CLOB:-

=====================

1. BLOB:-

 A. STANDS FOR 'BYNARY LARGE OBJECT'.

 B. REPRESENTS A LARGE BINARY STRING OF RAW BYTES.

 C. LIKE A VARBYTE BUT UP TO APPROXIMATELY 2 GB (2097088000).

 EXAMPLE:-

 ----------

 CREATE SET TABLE ORDERS

  (

   ORDERKEY INTEGER,

   ORDERSTATUS VARCHAR(10),

   ORDERMESSAGE BLOB(2097088000)

  )PRIMARY INDEX(ORDERKEY);

NOTE:- USED TO STORE SIMPLE TEXT, HTML, XML DOCUMENTS.

====================================================================================

2. CLOB:-

 A. STANDS FOR 'CHARACTER LARGE OBJECT'

 B. REPRESENTS A LARGE CHARACTER STRING.

 C. LIKE A VARCHAR BUT UP TO APPROXIMATELY 2 GB FOR LATIN CHARACTER SET AND ABOUT 1 GB FOR UNICODE CHARACTER SET.

 EXAMPLE:-

 ---------

 CREATE TABLE CUSTOMER_SERVICE

  (

   ID INTEGER NOT NULL,

   TEXT_FILE CLOB(10000)

  )UNIQUE PRIMARY INDEX(ID);

NOTE:- USED TO STORE GRAPHICS, VIDEO CLIPS AND BINARY FILES.

Teradata Employee

Hello Carrie,

I wonder collecting stats on ObjectId column on StatsTbl would make sense...It seem other collect stats statement you listed above covers the PI on each table, but the StatsTbl.ObjectId is not...

I tried explain show stats on a table, the result is "The row is sent directly back to the user as the result of statement1".

Does it mean it searches for the ObjectId UPI on StatsTbl? 

Thank you for your reply!

Best,

Jessie

Teradata Employee

Hi Jessie,

You do not need to collect statistics on any of the StatsTbl columns unless you have users who are accessing StatsTbl as one of the participating tables in a mult-itable query, and where the joins are not happening optimally.   So first, you would want to look at the DBQL object logging output to see if any users are accessing the StatsTbl table in combination with other dictionary tables, and joining it to other tables.   I don't think that is very likely, but you could check. 

Usually dictionary statistics are only needed when third party tools are accessing the X-views to get meta data, as expressed at the beginning of this posting.

You will never need to collect statistics in support of internal database-type requests, like HELP STATS, SHOW STATS or SHOW TABLE.   In the case of SHOW STATS, that request is executed as an express request, and express requests bypass the parser/optimizer and go directly to the AMPs.  That is why when you explain a SHOW STATs you don't get a plan back, because there is no optimized plan.

So I don't believe statistics would add value in the case you are describing because the optimizer would never look for them anyway.  And there is no more optimal way to access a dictionary table than by means of an express request.

Thanks, -Carrie

Teradata Employee

Hi Carrie,

I am using TD 14.10.  During stats colletion on DBC tables i got warning  :

COLLECT STATISTICS DBC.Indexes COLUMN (TableId, IndexNumber, DatabaseId);

 *** Update completed. 2 rows changed.

 *** Warning: 9683 The DBQL ObjectUsage logging is not enabled. The change percentag

e to skip statistics collection is determined based on dynamic A

MP sampling, which can only detect the row count change.

 *** Total elapsed time was 1 second.

before scheduling it on production system i ran it on VMware and i got this warning. i am little confuse with change percentage in warning as i didn't mention it in stats.

Stats collection syntex as mentioned by you and stats syntex before 14.0, both are same or is there any benefits combine these different colomn statistics in single collect statement.

Thanks - Sandeep.

Teradata Employee

Sadeep,

It is normal to get this warning if statistics have been skipped and you do not have USECOUNT logging on.  

If you look up that message (or google it), here is what it says:

9683:   The DBQL ObjectUsage logging is not enabled. The change percentage to skip statistics collection is determined based on dynamic AMP sampling, which can only detect the row count change.

Explanation:

The DBQL ObjectUsage logging is not enabled. The change percentage to skip this statistics collection is determined based on the dynamic AMP sampling which can only detect the row count change. Therefore, the system cannot accurately determine the update and the combination of insert and delete counts on the target table.

Notes:

This warning is issued when the DBQL ObjectUsage logging is not enabled and statistics collection is skipped.

Remedy:

Turn on ObjectUsage logging for the target database if more accuracy is needed.

Skipping is a technique in Teradata Database 14.10 that the optimizer uses when you have asked for stats recollection but the table does not appear to have changed enough since the last collection to warrant that recollection.  Under those conditions the optimizer may skip the re-collection.  You can read more about skipping in the Teradata 14.10 Statistics Enhancement orange book.

In this case, the warning is telling you that skipping happened, but that it did not happen with as much accuracy as it could, if USECOUNT logging had been turned on.  Instead of using actual Update, Delete and Insert counts that you would have available with USECOUNT logging, only random AMP sampling was used, and random sampling only detects a table's growth.  If you deleted as many rows as you inserted since the last collection, random AMP sampling would interpret that as no change, while USECOUNT logging would recognize the change.

Thanks, -Carrie

Teradata Employee

Thanks Carrie :)

Enthusiast

Hi Carrie,

I have stats collection run on DBC tables rcommended above daily.  But it looks like , and according to 14.10 orange book on stats, the collection skips,,, the optimizer internally enforces a 15 day change threshold,,,, since dbc tables cannot be tracked with DBQL 's USECOUNT.

So should we even not attempt to collect stats on DBC tables daily?

Is the dbc collec stat only allowed every 2 weeks?

Teradata Employee

Hi Nazy,

You are correct that there is an internal time threshold set for all data dictionary tables of 15 days.  Using the default settings, when you submit a stats collection on a dictionary table it will not be run until it has been 15 days since the last recollection.

If you wish to avoid this behavior for data dictionary tables and recollect more frequently, put USING NO THRESHOLD clauses on those collection statements.   The NO THRESHOLD clause disables all default thresholds for the column set or index specified in the COLLECT STATISTICS statement.    the USING NO THRESHOLD option is only supported on individual statitistic collection statements, not at the table collection level.

Thanks, -Carrie

Enthusiast

Hi Carrie,

I think running DBC stats daily is recommendable on versions earlier than 14.10. Do you  have any other frequency recommendations for version before and after 14.10?

Thank you.

Teradata Employee

Geeta,

The frequency with which you collect statistics on the data dictionary tables will depend on  1.)  How long it takes;  2.)  How frequently the DD tables actually change; and 3.)  The degree of user access of the dictionary  tables, and whether the plans being used by such queries require frequent stats recollections.  I know sites that collect dictionary stats every night, because they only take 2 or 3 minutes.  But that is not really necessary in most cases.  You will have to determine what is optimal for you.

The system default threshold for DBC table statistics recollections starting in 14.10 is 7 days.  That was chosen as the system default threshold for dictionary tables because it is considered a reasonable mainstream number of days.    However, you can force recollections on DBC tables so they execute when you want them to by adding USING NO THRESHOLD to your data dictionary table COLLECT STATISTICS statements in 14.10.

Thanks, -Carrie

Enthusiast

Thank you Carrie.

Enthusiast

Hi Carrie,

We recently upgraded to 15.00. Would you please provide the list of stats to be collected on DBC after the upgrade? Appreciate your time!

Thanks

Roopalini

Teradata Employee

Roopalini,

I don't have a list of DBC stats recommendations focused especially onr the 15.0 release.  What you can do is use the recommendations in the  blog posting for collecting stats on the dictionary tables in 14.0, then add stats to thecollection scripts you plan to use in 15.0 to cover any new DBC tables,  as you find that you need to.

The best way to determine which dictionary stats you actually need to collect is to examine DBQL output and see which dictionary views are being used by your end users.  Then make sure all join columns, etc. within those views are covered.  That is likely to differ from site to site.  

Thanks, -Carrie

-----Original Message-----

Enthusiast

Thanks Carrie! Will do that.

Teradata Employee

Hi,

1 question.

If we collect dictionary stats, will this collect stats impact the application performance in anyway? Do we need to test the application as a whole before implementing collection of dictionary stats into production?

Regards,

Atul Gangurde.

Teradata Employee

Atul,

Dictionary stats usually take only a couple of minutes to collect, as the tables tend to be small.  And you do not need to recollect them very often.  I am not aware of any issues around performance degradation caused by the collection of dictionary stats, although it is possible with any stats collection that you could use visible resources.  But you can run the stats at a low priority so you don't interfere with other work.   Most sites collect dictionary stats at off-hours or quiet times.  Also,  I'm not sure what application you are referring to in your question.

Thanks, -Carrie

COLLECT STATISTICS
COLUMN (SUBSTRING((TRANSLATE((TVMName )USING UNICODE_TO_LOCALE WITH
ERROR )) FROM (1) FOR (30 ))) AS ST_200113065866_0_tvm ON DBC.tvm
COLLECT STATISTICS COLUMN (SUBSTRING((TRANSLATE((DatabaseName )USING
UNICODE_TO_LOCALE WITH ERROR )) FROM (1) FOR (30 ))) AS
ST_200113065866_1_dbase ON DBC.dbase

Hi Carrie...

We have a complex macro with recursive sql accessing DBC tables...diagnostic helpstats suggests couple of collect stats statements on DBC tables to improve the performance. Attached in the snippet are those 2 statements. So our question is... Is it OK to do these stats on the DBC tables? Please let us know...

Thanks, Mani

Teradata Employee

TVM and Dbase are both tables that commonly have stats collected on them.   So I don't see a problem with collecting stats in addition to the ones usually recommended for those tables, if you see an advantage of doing so.   Adding new stats to DBC tables doesn't really have a downside and usually they are very quick to collect (usually Dbase and TVM aren't terribly large).  They will be ignored by the optimizer if the optimizer does not need them for a particular optimization, so there's no extra overhead there.

The best approach is to try out the stats collections and see if they give you better plans for this macro.   That is always the best way to determine if a certain statistic is going to add value or not.   See if you get a better plan and a faster query.

Thanks, -Carrie