Stats on non index column

Database
Enthusiast

Stats on non index column

Hi ,
Having some confusions about how collect stats really works

Suppose i am running this query
select * from table where column=300

Basically stats collected on columns used for selection will never result in 1 AMP or 2 AMP access…that can only be done through index access.

My question is what really is use of stats collected on set selection non-index column like above

a)is it solely used for estimation of spool file size used in full table scans

I don't think we have stats based on per AMP , so that we can have a 1 AMP operation sort of thing if optimizer knows that this particular value exists on which AMP......

Please clarify this thing as how stats on non-index column are useful in query access.....apart from spool size estimation

Regards,
Prakhar Agarwal
6 REPLIES
Enthusiast

Re: Stats on non index column

Hi ,
Can anybody throw some light on this....pls
Enthusiast

Re: Stats on non index column

Stats will help in pointing out the data demographics of the tables involved. This will ensure that the least data is brought in for processing

for example : if there is a join on a column what and there is a qualifier existing on it as well ie where clause to filter data based on the same column, the stats will determine the table on which the stats are applied based on where the maximum data will be eliminated.

similarly if a copy of a table is to be created across all amps for joining, stats will help in choosing the table to be replicated based on the data demographics.
Enthusiast

Re: Stats on non index column

Thankx for the reply Anadi...
But i am not exactly looking for that
My question was what really is use of stats collected on set selection non-index column ,even in case of JOIN it will be used to eliminate rows that will participate in JOIN but that is also related to estimation of spool file size.

I don't think we have stats based on per AMP , so that we can have a 1 AMP operation sort of thing if optimizer knows that this particular value exists on which AMP......

So why do we really need stats on non-index column having predicate....

Please revert back.......
Enthusiast

Re: Stats on non index column

Prakhar, may I refer you to an article by Carrie Ballinger on the Teradata Developer Network:

The reason for collecting single column and multi-column stats where appropriate is to provide the optimizer with the most accurate histogram of the data being used to satisfy a particular query. (This includes join columns, WHERE conditions, etc.) Statistics alone can take a poorly performing query plan and drastically improve the query plan without the need and overhead associated with a NUSI.

If you haven't done so already, consider using the following DIAGNOSTIC hint:

DIAGNOSTIC HELPSTATS ON FOR SESSION;

Then take a few queries that are performing below your expectations and run an EXPLAIN. At the end of the EXPLAIN will be suggestions on which statistics may help the optimizer improve the query plan. Focus your attention on those that should yield a high confidence. Capture the query plan before and after your collect the stats to see what impact was made. It should be noted there is a bug in Teradata 12 (depending on your current patch release) where the recommended stats to be collected are shown even if the stats exist. It should also be noted that you may not need to collect all the recommended stats in order to yield the best possible query plan.

Once you collect stats, you also need to maintain them. Stale statistics can be dangerous if the underlying data changes in volume or the demographics of the data shifts drastically.

Hope this helps.
Enthusiast

Re: Stats on non index column

Hi Rob,
Thankx for your reply...
As you have said "The reason for collecting single column and multi-column stats where appropriate is to provide the optimizer with the most accurate histogram of the data being used to satisfy a particular query"
But ultimately even if correct stats are presented for a query like
select * from table where column=300
Full table scan will happen......
Using stats optimizer can find out how many rows are expected to have value as 300 and build estimate spool size...
Apart from that is there anything else that helps optimizer in this case...

Regards,
Prakhar

Teradata Employee

Re: Stats on non index column

For a simple query, you are correct that stats on non-index columns merely give you a better cardinality estimate (and don't change the query plan). But it may be important to have that more accurate cardinality if you are doing some initial workload management classification based on the estimated query cost.

For more complex queries, an improved estimate for a subquery may be important in coming up with the optimal plan for the outer query. Or if the non-indexed column is used as a join predicate, it may be important for the optimizer to know the distribution & skew of values (i.e. the histogram or demographics) to pick a reasonable "join geography" / hash distribution for spool.