Statistics Collection Recommendations for Teradata 12

Database
The Teradata Database channel includes discussions around advanced Teradata features such as high-performance parallel database technology, the optimizer, mixed workload management solutions, and other related technologies.
Enthusiast

Re: Statistics Collection Recommendations for Teradata 12

Thanks a lot Carrie for your detailed response. It was of great help.
Please clarify the following "Query":-

For example, I have two queries involving complex joins. Query 1 is dependent on tables A, B, C and D whereas query 2 is dependent on tables B, C and E.

Tables B and C both are common. But the columns of joins being used (of tables B and C) in query 1 are different from columns of joins being used (of tables B and C) in query 2.

If we get recommendations using Teradata Statistics Wizard for Query 1, it may be different for query 2. So on executing the recommendations for each query, we end up with statistics being collected on tables B and C on columns which are present in join conditions of Query 1 but not for query 2. In other words staistics collected for query 1 won't be useful for query 2 or vice-versa. Going forward we may have statistics collected on almost every column in the table.

Should we collect statistics for column involved in joins?

Would same affect the performance of the query?

Or we can just collect statistics on all tables in producton that are being loaded frequently, irresepective of the queries being fired in database. Because in one or other way the queries would fetch data from the tables only which would statistics collected on them?

Thanks
Shrey
Teradata Employee

Re: Statistics Collection Recommendations for Teradata 12

In terms of join columns, if these are single-column join constraints, it is a good idea to collect statistics if the column is not unique. If the single column join column is defined as a unique column, statistics are not necessary.

Whether collecting statistics on a given column will impact the performance of a query, I would not be able to say for sure. I would not expect additional statistic collection to contribute to degradation. The more information you provide the optimizer, the better. You will have to try it out and decide for yourself if it is a useful statistic to collect.

Statistics are only used when queries access the tables. The purpose of statistics is to help the optimizer produce better query plans. If you have tables that no queries access, there will be little benefit in collecting statistics on those tables.

Thanks, -Carrie
Enthusiast

Re: Statistics Collection Recommendations for Teradata 12

What happens to the validity of the stats if a single column index is dropped ? (Considering that stats were collected on the index)
Internally the single column index's stats are stored as column stats and are not removed.
Do these get invalidated ?
Thanks
Teradata Employee

Re: Statistics Collection Recommendations for Teradata 12

Hi Sanji,

Indexes can be dropped on single column stats and the statistics will remain and will be valid and usable.

Collecting statistics on an INDEX versus the COLUMN that makes up the index is just a syntax difference. The collection and storage of the stats information is the same, and the impact on optimizer cost estimation is the same.

If you collect stats on a single column OR a single column index, the stats are stored in the DBC.TVFields row for the column.

However, if you collect while the index exists, Teradata may choose to scan the index rather than the table to do the COLLECT, regardless of whether COLUMN or INDEX- even if you used the COLUMN keyword in the COLLECT. That can reduce the time needed to recollect.

Thanks, -Carrie
Enthusiast

Re: Statistics Collection Recommendations for Teradata 12

Thanks.. Appreciate your response.
Teradata Employee

Re: Statistics Collection Recommendations for Teradata 12

Carrie,

I've read everywhere that collecting STATS on the skewed columns is a must if they're being queries against.
In my project, I've seen that if I suppress the STATS (using a function on that column) on the skewed column/s, the queries normally perform better.
Can you please explain this phenomenon?
Teradata Employee

Re: Statistics Collection Recommendations for Teradata 12

The only time I have seen a situation where a statistics does not help a query plan, but actually results in a worse query plan, is if the length of the statistic exceeds 16 bytes, and the leading 16 bytes are not very distinct. In a case such as that, it is sometimes better to go without the statistic.

Once you get on 14.0, there are options that will allow you to specify values in the histogram to be larger than 16 bytes, and then the statistics will be more useful.

If your statistic length is less than 16 bytes and the plan is worse without the statistic than with the statistic, and nothing else has changed before and after, then I would open an incident with the support center on this so the circumstances around this can be examined in detail.
Enthusiast

Re: Statistics Collection Recommendations for Teradata 12

Hi Carrie,

we have T12 , and i do not have any stats defined on the below tables which i got as recommendations from blogs:

DBC.TVM

     DBC.DBase

     DBC.TVFields

     DBC.AccessRights

     DBC.Indexes

     DBC.Profiles

     DBC.Owners

     DBC.Roles

     DBC.RoleGrants

     DBC.UDFInfo

We do not have any PDCR in place and we just do purging of dbqlogtbl on monthly basis by simple insert select. on discussion, i got the notion that if we are really not fetching dbc tables specifically dbqlogtbl, then we do not need. I have seen in big DW, those stats are getting collected on a daily basis, but for our system which is not much big, stats were not defined and hence not getting collected. Can you please suggest if we can define the stats and then make a process of getting them collected.

Also, are there any side effects of defining and collecting even we are not fetching them very often.

Enthusiast

Re: Statistics Collection Recommendations for Teradata 12

Hello Carrie,

My Questions are related to your "Dropping MC Stats" Article & David Roth's Similar Article.

(a) You mentioned that the Data Content's Length matter, not the Column Length. Meaning, say Stats collected on (Col_1,Col_2) with Col_1 is Varchar(30) & Col_2 is Integer. So, What Matters is whether the Data in Col_1 exceeds 16 Byte or not, rather than the Column Length[Which is already more than 16 Bytes @ Varchar(30)].

In which case, ('ABCDEFHGIJKLMNOPQ',14) & ('ABCD',14) are Different.

(b) Stats(Col_1) and Stats(Col_1,Col_2) exists, with Col_1 nearly Unique[>= 95%]. David recommend Dropping Stats(Col_1,Col_2) based on Explain Capture on "Select Distinct(Col_1,Col_2) From Tablename", which is a Single-Table Operation.

As per you, if these 02 Columns are used in Joining Conditions or Aggregate Operations, these would be required for Unique Values Calculation, which in Independent of Truncation.

So, Is the Approach of Dropping Stats based on the David's Experiment alone [Single Table Experiment] Justified ? Cause, if both David and you are Correct, Dropping a MC Stats would require David's Approach PLUS digging around SQLs to verify that those Columns are not used in Joining or Aggregation Operations, which is an extremely difficult job.

(c) Assuming that Stats(Col_1,Col_2,Col_3) exists and all these 03 Columns are not used in Joining or Aggregate Operations, then the Dropping MC Stats' preferential order of approach would be:

(1) If first 16 Bytes same for (Col_1,Col_2,Col_3), Drop the Stats.

(2) If Unique(Col_1) >= say,95%, Drop Stats(Col_1,Col_2,Col_3) [Collect on Col_1 instead].

Meaning, first I check (1) and then (2).

Enthusiast

Re: Statistics Collection Recommendations for Teradata 12

Is there any query which can identify which defined MC crosses 16 byte convention and can be dropped :) at database or table level. Also, incase we get some MC which can be dropped , how we can analyze the benefits i.e CPU/IO etc?