UDA

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

11-10-2006
01:07 AM

11-10-2006
01:07 AM

What are factors to decide how much time it should take for "collect statistic" and "refresh statistic"?

My question is related to time it takes for "collect statistic" and "refresh statistic"?

My question is related to time it takes for "collect statistic" and "refresh statistic"?

15 REPLIES

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

11-10-2006
04:06 AM

11-10-2006
04:06 AM

It is how frequently your data in the tables is added/modified/deleted. If the data changes very frequently, you might need to collect /refresh statistics often to get best performance for your queries.Ofcourse,remember that collect stats is resource intensive !

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

11-13-2006
02:05 AM

11-13-2006
02:05 AM

WHEN EVER THERE IS EXTENSIVE INSERTs, DELETEs, UPDATEs, OR ANY OTHER MAINTANENCE TO ANY COLUMNS WHICH IS CONTAINING STATISTICS.WE SHOULD REFRESH THE STATS.

THE CHANGE OF DATA SHOULD OF 5 TO 10 %

THE CHANGE OF DATA SHOULD OF 5 TO 10 %

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

11-13-2006
12:13 PM

11-13-2006
12:13 PM

Another parameter to check is the confidence level in the explain plan.

Vinay

Vinay

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

11-14-2006
12:49 AM

11-14-2006
12:49 AM

one more hint. use "diagnostic helpstats" to get recommendations on collect stats on particular columns. This will be displayed at the end of explain text

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

02-19-2008
11:09 PM

02-19-2008
11:09 PM

Hi all,

Could u pls tell me something more about "diagnostic helpstats" .....

Thanks,

abc

Could u pls tell me something more about "diagnostic helpstats" .....

Thanks,

abc

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

02-19-2008
11:52 PM

02-19-2008
11:52 PM

Hi all,

Is the same procedure followed to do the collect stats when it is done for the first time and when done for any successive times on the same table (i,e when 5% - 10% of data is being altered)

Thanks,

abc

Is the same procedure followed to do the collect stats when it is done for the first time and when done for any successive times on the same table (i,e when 5% - 10% of data is being altered)

Thanks,

abc

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

02-20-2008
07:41 AM

02-20-2008
07:41 AM

diagnostic helpstats is used to understand the optimizer recommendations of what stats it thinks might be useful in making a (probably) better query execution plan.

you can do it by typing the following command in sql assistant and doing the explain on your query after wards.

diagnostic helpstats on for session;

explain

select a, b, c

from t1, t2

where t1.a = t2.b

....

When you recollect stats on a table, teradata internally does all the same operations it did the first time you collected stats on it. so depending on whether it was a full stats or a sampled stats it would either scan the full table (exceptions are if there are index subtables which would give a faster results) or scan a percentage of the table to collect stats information (this is across the whole table and not just the additional 5% or 10% of the records that changed)

you can do it by typing the following command in sql assistant and doing the explain on your query after wards.

diagnostic helpstats on for session;

explain

select a, b, c

from t1, t2

where t1.a = t2.b

....

When you recollect stats on a table, teradata internally does all the same operations it did the first time you collected stats on it. so depending on whether it was a full stats or a sampled stats it would either scan the full table (exceptions are if there are index subtables which would give a faster results) or scan a percentage of the table to collect stats information (this is across the whole table and not just the additional 5% or 10% of the records that changed)

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

02-20-2008
10:45 PM

02-20-2008
10:45 PM

Hi all,

Thanks for that...

Can Diagnostic statistics be activated at a still higher level? so that when an explain is done we get the optimiser recommendations?

Also in most of the explain statements we get something like.....

We do an all-AMPs RETRIEVE step from "CS".employee1 by way of an

all-rows scan with no residual conditions into Spool 1

(group_amps), which is built locally on the AMPs.

what is meant by .....

no residual conditions into Spool 1

and

(group_amps)???

In another explain statement...... which is partially listed out here.....

The size of Spool 2 is estimated with high confidence to be 54,555,839 rows.

The estimated time for this step is 8.41 seconds.

The size of Spool 3 is estimated with high confidence to be 1,852,824,064 rows.

The estimated time for this step is 7 minutes and 56 seconds.

We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of a

RowHash match scan, which is joined to Spool 3 (Last Use) by way of a RowHash match scan.

Spool 2 and Spool 3 are joined using a merge join, with a join condition of

((T1.col1 > t2.col1) and ((t1.col2 < tab2.col2) and (tab1.col3 = tab2.col3)))

The result goes into Spool (group_amps), which is built locally on the AMPs.

The result spool file will not be cached in memory.

The size of Spool 1 is estimated with no confidence to be 3,868,443,675 rows.

The estimated time for this step is 3 minutes and 36 seconds.

Note : collect stats is done on all the columns used in the join condition then why in the second last statement above states as NO CONFIDENCE ???

Kindly explain

Thanks,

abc

Thanks for that...

Can Diagnostic statistics be activated at a still higher level? so that when an explain is done we get the optimiser recommendations?

Also in most of the explain statements we get something like.....

We do an all-AMPs RETRIEVE step from "CS".employee1 by way of an

all-rows scan with no residual conditions into Spool 1

(group_amps), which is built locally on the AMPs.

what is meant by .....

no residual conditions into Spool 1

and

(group_amps)???

In another explain statement...... which is partially listed out here.....

The size of Spool 2 is estimated with high confidence to be 54,555,839 rows.

The estimated time for this step is 8.41 seconds.

The size of Spool 3 is estimated with high confidence to be 1,852,824,064 rows.

The estimated time for this step is 7 minutes and 56 seconds.

We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of a

RowHash match scan, which is joined to Spool 3 (Last Use) by way of a RowHash match scan.

Spool 2 and Spool 3 are joined using a merge join, with a join condition of

((T1.col1 > t2.col1) and ((t1.col2 < tab2.col2) and (tab1.col3 = tab2.col3)))

The result goes into Spool (group_amps), which is built locally on the AMPs.

The result spool file will not be cached in memory.

The size of Spool 1 is estimated with no confidence to be 3,868,443,675 rows.

The estimated time for this step is 3 minutes and 36 seconds.

Note : collect stats is done on all the columns used in the join condition then why in the second last statement above states as NO CONFIDENCE ???

Kindly explain

Thanks,

abc

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

02-21-2008
11:10 AM

02-21-2008
11:10 AM

Q. what is meant by .....

no residual conditions into Spool 1

Ans: All applicable conditions are applied.

Q. and

(group_amps)???

Ans: A subset of AMPs will be used instead of All AMPs.

Q. Note : collect stats is done on all the columns used in the join condition then why in the second last statement above states as NO CONFIDENCE ???

Ans: Your join has inequality conditions. ( I am not 100% sure of this answer).

no residual conditions into Spool 1

Ans: All applicable conditions are applied.

Q. and

(group_amps)???

Ans: A subset of AMPs will be used instead of All AMPs.

Q. Note : collect stats is done on all the columns used in the join condition then why in the second last statement above states as NO CONFIDENCE ???

Ans: Your join has inequality conditions. ( I am not 100% sure of this answer).