Identifying Used, Unused and Missing Statistics

Database
The Teradata Database channel includes discussions around advanced Teradata features such as high-performance parallel database technology, the optimizer, mixed workload management solutions, and other related technologies.
Enthusiast

Identifying Used, Unused and Missing Statistics

New DBQL logging options USECOUNT and STATSUSAGE, introduced in Teradata Database 14.10, enable the logging of used and missing statistics.  The output of this logging can be utilized to find used, unused, and missing statistics globally (for all queries) or for just a subset of queries.

The USECOUNT logging option, which starting in Teradata Database 14.10 supports the object use count (OUC) feature, is associated with the database that owns the table.  USECOUNT logs usage from all requests against that table irrespective of the user who submitted the query.  This option logs the object usage from such objects as databases, tables, indexes, join indexes, and statistics. Additionally, it logs insert, update, and delete counts against the database’s tables.  The STATSUSAGE logging option, similar to other DBQL logging options, is associated directly to a user and logs the used and missing statistics at the query level within an XML document.

In this article, we focus on the logging of the used statistics, how to retrieve them and join them to other dictionary tables to find the global list of used and unused statistics and also the missing statistics at query level.

Enabling the DBQL USECOUNT Option

Each statistic you define becomes an object that USECOUNT logging can track.  Each time the optimizer actually makes use of a specific statistic, an access counter in the new DBC.ObjectUsage table gets incremented.  So at any point in time you can look in the data dictionary and evaluate how frequently (or even if ever) a particular statistic is being used.

The descriptions of the statements to enable object use count (OUC) logging on a given database or a user are given below.  It is especially useful to enable the USECOUNT option on the databases that own permanent tables.  It can be enabled on users or user accounts also, but you would do that only when these accounts have tables on which you want the logging enabled. 

Description

Command

To enable on a database (which is not a user). Note that other DBQL options are not allowed to be enabled on a database.

BEGIN QUERY LOGGING WITH USECOUNT ON <database Name>;

To enable on a user with no DBQL options enabled.

BEGIN QUERY LOGGING WITH USECOUNT ON <User Name>;

To enable on a user having some existing DBQL options enabled (use SHOW QUERY LOGGING ON <User> to find the current DBQL options).

REPLACE QUERY LOGGING WITH <current options>, USECOUNT ON <User Name>;

 

The following are exceptions and special scenarios in OUC logging:

  1. OUC logging is not applicable for DBC tables.  In other words, the OUC feature doesn’t log the use counts for statistics you may be collecting on dictionary objects.
  2. OUC logging is not applicable for temporary tables.
  3. Dropping statistics also drops the corresponding entries in DBC.ObjectUsage table.  So, if you are dropping and recollecting statistics, the use counts reflect usage from the last collection only.
  4. Unlike other DBQL options, disabling USECOUNT invalidates (resets the counts and timestamp) the corresponding rows in DBC.ObjectUsage. It is not recommended to disable this option unless there is a specific reason to do so.

The following sections describe how to find these statistics for each category along with examples.

Identifying Used Statistics

After enabling DBQL USECOUNT and letting some time pass, the following query can be used to query the dictionary table DBC.ObjectUsage to get the use counts of the existing statistics.  The number of days a statistic has been continuously under the control of USECOUNT logging is given by the column DaysStatLogged; a value of -1 for this column indicates that USECOUNT option has been disabled after being enabled for some time in the past.  You need to consider the number of days a statistic was a candidate for logging along with use counts to normalize the comparison across different statistics.

SELECT DBC.DBase.DatabaseName AS DatabaseName
      ,DBC.TVM.TVMName        AS TableName
      ,COALESCE(DBC.StatsTbl.StatsName
         ,DBC.StatsTbl.ExpressionList
               ,'SUMMARY')    AS StatName
      ,OU.UserAccessCnt       AS UseCount
      ,CAST(OU.LastAccessTimeStamp AS DATE) AS DateLastUsed
      ,CASE
       WHEN DBC.DBQLRuleTbl.TimeCreated IS NULL
       THEN -1  -- Logging disabled
       WHEN DBC.DBQLRuleTbl.TimeCreated > DBC.StatsTbl.CreateTimeStamp
       THEN CURRENT_DATE - CAST(DBC.DBQLRuleTbl.TimeCreated AS DATE)
       ELSE CURRENT_DATE - CAST(DBC.StatsTbl.CreateTimeStamp AS DATE)
       END AS DaysStatLogged
FROM DBC.ObjectUsage OU
    ,DBC.Dbase
    ,DBC.TVM
    ,DBC.StatsTbl LEFT JOIN DBC.DBQLRuleTbl
           ON DBC.StatsTbl.DatabaseId = DBC.DBQLRuleTbl.UserID
          AND DBQLRuleTbl.ExtraField5 = 'T'   /* Comment this line if TD 15.0 or above   */
        /*AND DBQLRuleTbl.ObjectUsage = 'T'*/ /* Uncomment this line if TD 15.0 or above */
WHERE DBC.Dbase.DatabaseId    = OU.DatabaseId
  AND DBC.TVM.TVMId           = OU.ObjectId
  AND DBC.StatsTbl.DatabaseId = OU.DatabaseId
  AND DBC.StatsTbl.ObjectId   = OU.ObjectId
  AND DBC.StatsTbl.StatsId    = OU.FieldId
  AND OU.UsageType            = 'STA'
ORDER BY 1, 2, 3;

Customize the above query for your databases and tables of interest, number of days a statistic is logged, etc.  A sample output of the above query is given below. 

Database

Name

TableName

StatName

Use

Count

DateLast

Used

DaysStat

Logged

DBA_TEST

PARTY_DATA

CURR_FLAG

209

11/15/2014

-1

PNR_DB

CODE_SHARE

FK_ITIODPAD_ITINERARY_INFO_ODP

253

11/1/2014

34

PNR_DB

ITINERARY_TBL

AD_ITINERARY_INFO_ODP

203

11/1/2014

34

PNR_DB

ITINERARY_TBL

CD_ID_PRODUCT

203

11/1/2014

34

PNR_DB

ITINERARY_TBL

CD_ORIGINAL_STATUS_CODE,CD_STATUS,

FK_PNRODP_AD_PNR_HEADER_ODP

203

11/2/2014

34

PNR_DB

PNR_HEADER_ODP

AD_PNR_HEADER_ODP

600

11/28/2014

34

PNR_DB

PNR_HEADER_ODP

AD_PNR_HEADER_ODP,CD_CONTROL_NUMBER_LOC

700

11/28/2014

34

Identifying Unused Statistics

Using USECOUNT to find unused statistics requires consideration of multiple time dimensions.  The basic question being answered here is:  Am I collecting statistics that have not been used for some duration?  To answer this question, you need to consider not only the usage counts, but also how long the USECOUNT logging was active, the most recent usage time stamp and also the age of the statistics.  For example, if USECOUNT logging has been enabled only for the past few days, some statistics which get used on a monthly/quarterly workload may not yet have been logged as used.  Similarly, if you collect new statistics (not re-collections), you need to let them get exposed to different workloads for certain period of time to properly identify whether they are being used or not.

The following query is designed to consider these aspects and to list out the statistics that meet the following criteria.

  1. The age of the statistic is more than N days (first collected more than N days ago).
  2. DBQL USECOUNT logging is active for more than N days on the owning database.
  3. The statistic has not been used in the last N days.

The value of N can be customized based on your requirements. In the example query given below, the value of N is set to 30 (note that N is used in two predicates; both need to be updated if you customize this value).

SELECT DBC.DBase.DatabaseName AS DatabaseName
      ,DBC.TVM.TVMName        AS TableName
      ,COALESCE(DBC.StatsTbl.StatsName
               ,DBC.StatsTbl.ExpressionList
               ,'SUMMARY')    AS StatName
      ,CURRENT_DATE - CAST(DBC.StatsTbl.CreateTimeStamp AS DATE) AS StatAge
      ,CASE
       WHEN DatabaseName = 'DBC'
       THEN -2  -- Logging Not Applicable
       WHEN DBC.StatsTbl.StatsType IN ('B', 'M')
       THEN -2  -- Logging Not Applicable on Temp tables (base and materialized)
       WHEN DBC.DBQLRuleTbl.TimeCreated IS NULL
       THEN -1  -- Logging Not Enabled
       WHEN DBC.DBQLRuleTbl.TimeCreated > DBC.StatsTbl.CreateTimeStamp
       THEN CURRENT_DATE - CAST(DBC.DBQLRuleTbl.TimeCreated AS DATE)
       ELSE CURRENT_DATE - CAST(DBC.StatsTbl.CreateTimeStamp AS DATE)
 END AS DaysStatLogged 
FROM   DBC.StatsTbl LEFT JOIN  DBC.DBQLRuleTbl
           ON DBC.StatsTbl.DatabaseId = DBC.DBQLRuleTbl.UserID
          AND DBQLRuleTbl.ExtraField5 = 'T'    /* Comment this line if TD 15.0 or above   */
        /*AND DBQLRuleTbl.ObjectUsage = 'T'*/  /* Uncomment this line if TD 15.0 or above */
      ,DBC.Dbase
      ,DBC.TVM     
WHERE DBC.StatsTbl.DatabaseId = DBC.DBASE.DatabaseId
  AND DBC.StatsTbl.ObjectId   = DBC.TVM.TVMId
  AND NOT EXISTS (SELECT 100 FROM DBC.ObjectUsage OU
                  WHERE OU.UsageType  = 'STA'
                    AND OU.DatabaseId = DBC.StatsTbl.DatabaseId
                    AND OU.ObjectId   = DBC.StatsTbl.ObjectId
                    AND OU.FieldId    = DBC.StatsTbl.StatsId
                    AND CURRENT_DATE - CAST(OU.LastAccessTimeStamp AS DATE) < 30
                 )
  AND DaysStatLogged > 30
  AND DBC.StatsTbl.StatsId <> 0  -- Do not qualify table-level SUMMARY statistics as unused
                                 -- May get implicitly used but not recorded as used
ORDER BY 1, 2, 3;

You can customize the above query to adjust the number of days the statistic has not been used, for the databases and tables of your interest, the age of the statistic, etc.

DatabaseName

TableName

StatName

Stat

Age

DaysStat

Logged

PNR_DB

ITINERARY_TBL

PARTITION

34

34

PNR_DB

ITINERARY_TBL

ST_281020293276_0_ITINERARY_INFO_ODP

34

34

PNR_DB

ITINERARY_TBL

TS_ULT_ALZ

34

34

PNR_DB

PNR_HEADER_ODP

CD_CREATION_OFFICE_ID

34

34

PNR_DB

PNR_HEADER_ODP

CD_CREATOR_IATA_CODE

34

34

PNR_DB

PNR_HEADER_ODP

PARTITION

34

34

Identifying Missing Statistics

Two additional DBQL logging options, STATSUSAGE and XMLPLAN, create XML documents that identify which statistics were used, and ones that the optimizer looked for but did not find.  Contrary to USECOUNT, these two logging options should be turned on temporarily and only as needed for analysis.  These two options are enabled the same way as other DBQL logging is, by User or Account.  Their output can be found in DBC.DBQLXMLTbl.

STATSUSAGE logs the usage of existing statistics within a query, as well as recommendations for new statistics that were found to be missing when the query was optimized.  It does this without tying the recommendation to a particular query step in the plan.  The relationship to query steps can only be seen if XMLPLAN logging is also enabled.  XMLPLAN logs detailed step data in XML format.

Enable STATSUSAGE when looking for missing statistics always, with or without XMLPLAN.  If you enable XMLPLAN by itself without STATSUSAGE, no statistics-related information of any kind is logged into DBQLXMLTbl.  STATSUSAGE provides all the statistics recommendations and, if both are being logged, those statistic recommendations can be attached to specific steps in the plan.

Because XMLPLAN comes with increased overhead, for the purposes of identifying missing statistics, it usually is sufficient to start with STATSUSAGE logging without XMPLPLAN.  The step information available in XMLPLAN is more useful when in analysis mode.

There is more information about logging to the DBQLXMLTbl in the DBQL chapter in the Database Administration manual, as well as in the orange book Teradata Database 14.10 Statistics Enhancements.

At the end of the XML document that represents the query, there is a series of entries with the <StatsMissing> label that look like this:

  <StatsMissing Importance="High">
      <RelationRef Ref="REL1"/>
      <FieldRef Ref="REL1_FLD1035"/>
    </StatsMissing>
    <StatsMissing Importance="High">
      <RelationRef Ref="REL2"/>
      <FieldRef Ref="REL2_FLD1025"/>
    </StatsMissing>

Each FieldRef value indicates a relation and field that together represent a missing statistic.  It is composed of a relation ID (REL1, for example) and a field ID (FLD1035, for example).  If the statistic is composed of multiple columns there is a FieldList label under which a FieldRef for each individual column appears.

In order to discover the actual table and column name, look higher in the XML document to find the Relation, and match the Relation ID to the Relation ID that appears in the StatMissing data.  That tells you the name of the table.  Below each relation description is a label for each field which provides both the FieldID and the FieldName.  This allows you to easily match the ID values from the StatsMissing section to the Relation and Field sections higher up.  Here are two Field sections from the top of the same XML document that match to the StatsMissing sections illustrated above.

<Field FieldID="1035" FieldName="L_SHIPDATE" Id= "REL1_FLD1035" JoinAccessFrequency="0" 
RangeAccessFrequency="0" RelationId="REL1" ValueAccessFrequency="0"/>
<Field DataLength="4" FieldID="1025" FieldName="O_ORDERKEY" FieldType="I" Id="REL2_FLD1025" JoinAccessFrequency="0"
RangeAccessFrequency="0" RelationId="REL2" ValueAccessFrequency="0"/>

Note that the optimizer assigns an importance to each missing statistic it reports on.

Once you have identified missing statistics for one query, you can consider how many other queries also list those statistics as missing, and then make the decision whether or not to start collecting them.  Using these new functionalities, you can streamline your statistics collection routines so they are more efficient and more focused.

Tags (2)
34 REPLIES
Enthusiast

Re: Identifying Used, Unused and Missing Statistics

Very interesting, a key feature. Thank you

Re: Identifying Used, Unused and Missing Statistics

Good information. Thanks

Enthusiast

Re: Identifying Used, Unused and Missing Statistics

Excellent Article RK! Thank you!

Visitor

Re: Identifying Used, Unused and Missing Statistics

Nice Article ....!!!

Junior Supporter

Re: Identifying Used, Unused and Missing Statistics

Hi RK,

Thanks for such a wonderful info. I have one question here.

1. Is it not possible to enable this logging for a table in a DB. The reason why i am asking this because i want to use it in my current production env. But, if i enable this on a whole DB, it will log for each table in that DB and it will be a lot of logging. Can this not cause a space issue in DBC as lots of logging with happen. Instead i want to target some  tables first and get the used/unused stats.

2. What other  performance impications that we need to consider before enabling this ?

Thanks !

Samir

Enthusiast

Re: Identifying Used, Unused and Missing Statistics

Samir, All DBQL options including USECOUNT can be enabled only at database level as of now. For each table the max number of entreis would be number of columns +number of indexes + number of stats + 1 (one additional entry is to log table level information). You reach max only when all of them are used in your query plans. You can use this information to compute the space requirements for logging (uses DBC database space).

The logging has optimizations to buffer the information and flush them at the timeout (default is 10 mins). We have not seen any significant impact on the system resources with this logging.

Junior Supporter

Re: Identifying Used, Unused and Missing Statistics

Hi RK,

For my test database , i have enabled this logging. I have 16 columns + 1 index + 4 stats(only 3 column stats and one stats with columname "*"(not sure what is this in TD14?)) + 1 = 22. Then I run one sample query on the test table using these columns. I can see 20 entries for this table  after i run a slect * from DBC.ObjectUsage. Is my understanding correct ?

1. One entey has usage type as "Sta", others have this value as "DML". I have not done any dml on this table, then why do we have DML there ?

2. I couldn't get the Flush concept that you talked about above. Cld you please elaborate on it. I dont see these enties being flushed out from dbc table after 10 min. Also, is this data moved to pdcrdata history tables after a day as with tables like dbqllog_hst, dbqlsql_hst and dbqlobj_hst tables ?

Thanks !

Samir

Enthusiast

Re: Identifying Used, Unused and Missing Statistics

UsageType is marked as "STA" for statics usage entries. Other entries are marked as "DML". Please refer to the following user manual which explains how the entries are logged.

http://www.info.teradata.com/HTMLPubs/DB_TTU_15_00/index.html#page/SQL_Reference/B035_1142_015K/ch02...

For question #2, The log entries are not written out to the disk for every qurey. They are cached in parsing engine cache, aggregated and written to the disk (to DBC.ObjectUsage) periodically (for every 10 mins) for performance reasons (note they are not flushed out from dbc.objectusage once written). I am not sure about the pdcrdata. You may have to check other sources for this info.

Teradata Employee

Re: Identifying Used, Unused and Missing Statistics

Hi RK,

Thank you for the detailed explanation and the features of Object UseCount.

We do run backups every week on our system. So, if we enable the usecount, do we have to disable and re-enable the usecount before the backups and after they are done?

Thanks

Kishore