Statistics Creation on Tables and Stat Subsets to Enable Faster Join Queries

Database
Enthusiast

Statistics Creation on Tables and Stat Subsets to Enable Faster Join Queries

Hello, 

Consider the following situation: 

Table Alpha has the following columns:

Col1

Col2

Col3

Col4

Col5

Primary Index (Col1, Col2, Col3, Col4)

Table Beta has the following columns:

Col1

Col2

Col3

Col6

Col7

Primary Index (Col1, Col2, Col3)

Table Gamma has the following columns:

Col1

Col2

Col8

Col9

Col10

Primary Index (Col1, Col2)

A user may want to join Alpha to Beta, Alpha to Gamma, or Beta to Gamma.  Generally speaking, does it suffice to only run the following statistics? 

Alpha - (Col1, Col2, Col3, Col4)

Beta - (Col1, Col2, Col3)

Gamma - (Col1, Col2)

Or would there be benefit to adding the following stats also?

Alpha - (Col1, Col2) // for join to Gamma

Alpha - (Col1, Col2, Col3) // for join to Beta

Beta - (Col1, Col2) // for join to Gamma

Posed a different way, can the Teradata Query Optimizer use subsets of statistics, or should there be specific statistics for specific potential join conditions?  Note that in our case, join indexes are not a viable solution, given our business conditions.  

4 REPLIES
Enthusiast

Re: Statistics Creation on Tables and Stat Subsets to Enable Faster Join Queries

Hi,

The best way to check the required stats is to use help stats diagnostics.

You can use it as follows:

DIAGNOSTIC HELPSTATS ON FOR SESSION;

Explian Your_Query

Then go to the end of explain plan and collect the recomended stats for the joins.

Secondly if your joins are based on Primary index then the stats collection on index columns will be enough to optimize joins.

Khurram
Teradata Employee

Re: Statistics Creation on Tables and Stat Subsets to Enable Faster Join Queries

I would disagree that DIAGNOSTIC HELPSTATS is the "best" way because that will simply identify all stats variations that are pertinent to the query you are studying and not necessarily giving you a "minimum that you should collect" answer.

The answer to the original question should be, No, the Optimizer is not going to analyze subsets of statistics in the way that ricejfx was hoping because the scenario described only collects statistics on sets of columns.  If you instead collected stats on each individual column the Optimizer **will** consider permutations of the individual columns but at the potential risk of inaccuracy.

For example, if 4 different COLLECT STATISTICS statements were issued against table Alpha to collect on the individual columns Col1, Col2, Col3, and Col4 then permutations are considered but it is possible that signficant skewing associated with the key value ( Col1=9, Col2=9, Col3=9 ) **might** go unnoticed.  The Optimizer is often able to catch those situations with multi-column stats but not always so I would collect stats on the permutations if I really felt the risk is real.  In the end, "know your data" is the best way to decide.

Teradata Employee

Re: Statistics Creation on Tables and Stat Subsets to Enable Faster Join Queries

DIAGNOSTIC HELPSTATS is just a guideline .... or a starting point for stats collection. It doesn't give you the complete need or end-to-end stats to be collected.

Usually, it tends to solve the issue for many cases, but again .... its not a complete solution for stats collection.

Teradata Employee

Re: Statistics Creation on Tables and Stat Subsets to Enable Faster Join Queries

As for the original question .... what are other details like:

- Data-size

- Any other use of tables

- Is it joined only on those columns or can be joined for other columns as well

- Does Col1 in all tables have same data-type?

- Is there any casting involved? etc.

Collecting stats can have adverse action as well, so it is not recommended to collect stats on all (joining) columns. And you need to have a complete picture of how the table is going to be used.