Need of Collect Statistics

Tools
Enthusiast

Need of Collect Statistics

Hi. What is the need of collect statistics in teradata? Can anyone explain with suitable examples? Also, How can we choose the collect statistics columns?
11 REPLIES
Enthusiast

Re: Need of Collect Statistics

Collect stats is an important concept in teradata, collect stats gives PE to come up with a plan with least cost for an requested query. Collect stats defines the confidence level of PE in estimating "how many rows it is goin to access ? how many unique values does a table have , null values etc and all this info is stored in data dictionary. Once you submit a query in TD the parsing engine checks if the stats are aviable for the requested table , if it has collected stats earlier PE generates a plan with "high confidence" . in absence of collect stats plan will ne with "low confidence" . however teradata's optimizer is very robust and intelligent, even if you do not collect stats on a table,coulmn,indexes PE does an "Dynamic Amp sampling " which means it will select a random amp and this random amp comes up with the info about the table data it has , based upon this PE ( knows datademographics & available system componets) estimates the workload and generates a plan.
hope this info helps.
Enthusiast

Re: Need of Collect Statistics

Thanks Samp for your explanation. Thanks a lot !!!

Re: Need of Collect Statistics

Hi Sam,

I want to write TD-12 certification Exams.

Could you please share with me the details about Exam, preparation Material, Question Paper pattern, time, etc....

please share the certification material if you have any, my e-mail: pkumar.td@gmail.com

Thanks & Regards

Praveen

Enthusiast

Re: Need of Collect Statistics

collect stats is very very import concept in teradata. with collect stats the optimizer desides the path for the data retrival(i.e low cost plan) and sends it to the amps.

not only collecting stats once . it should be refresh frequently when ever the data is insert into particular table. it is better to collect stats on columns.

if you collect stats on index level if once the indexs are deleted the stats also will be dropped. then the reterival of the  data is slow.

syntax for collect stats is

collect stats on emp  index(empid);

Re: Need of Collect Statistics

Hi, 

There is a mention of "least cost" in this discussion. 

How do we calculate the cost of a Query? could it be calculated in numbers? then what is the unit of measurement?

Enthusiast

Re: Need of Collect Statistics

Hi All,

Some doubts in statistics collection using Teradata Statistics Wizard:-

  1. What are stale statistics?
  2. Collection of statistics is advisable to be collected as per recommendation for a particular workload or for all tables which are part of the workload?
  3. If I define a workload using "Create from SQL statement" and then get recommendations for collecting statistics on that particular workload, recommendation would include statistics collection on columns/ indexes for the query mentioned in workload. Here it might happen that statistics are collected on some columns/ indexes which may be not required for other queries. In other words, would the statistics collected for a particular query be stale for other queries?
  4. Is it advisable to collect statistics for all tables that are frequently used regularly?
Teradata Employee

Re: Need of Collect Statistics

1. Stale stats are out-dated stats .... e.g stats are not collected for a long period of time while data has been changing in the table .... so we have stale stats that needs to be refreshed or recollected.

2. Ideally, for all tables which are part of workload.

3. No, stats collected for a particular query can't be stale for some queries as there 'stale-ness' depends on the data in the table, not the query.

4. Not exactly .... it is advisable to collect stats for all tables that are loaded frequently .... i.e. the tables which have changes in data frequently.

HTH!

Enthusiast

Re: Need of Collect Statistics

Thank you Adeel for your response...

Few more doubts:-

For example, I have two queries involving complex joins. Query 1 is dependent on tables A, B, C and D whereas query 2 is dependent on tables B, C and E.

Tables B and C both are common. But the columns of joins being used (of tables B and C) in query 1 are different from columns of joins being used (of tables B and C) in query 2.

If we get recommendations using Teradata Statistics Wizard for Query 1, it may be different for query 2. So on executing the recommendations for each query, we end up with statistics being collected on tables B and C on columns which are present in join conditions of Query 1 but not for query 2. In other words staistics collected for query 1 won't be useful for query 2 or vice-versa. Going forward we may have statistics collected on almost every column in the table.

Should we collect statistics for column involved in joins?

Would same affect the performance of the query?

Or we can just collect statistics on all tables in producton that are being loaded frequently, irresepective of the queries being fired in database. Because in one or other way the queries would fetch data from the tables only which would statistics collected on them?

Teradata Employee

Re: Need of Collect Statistics

Collection of unnessasary stats does have a downside, but considering your scenario, it shouldnt be an issue.

Moverover, it is always an evolving thing to come up with the best possible stats to collect. So, its better to analyze for a time-being and then tweak, add or remove stats.

HTH!