Stats Collection

Database
The Teradata Database channel includes discussions around advanced Teradata features such as high-performance parallel database technology, the optimizer, mixed workload management solutions, and other related technologies.
Enthusiast

Stats Collection

Hi,

 

  I have a table with 19M records with approx 20 columns, I execute an update statement that updates 5M records for two columns . In that case , should i have the collect stats executed.

 

  Your comments are appreciated

 

Regards

Parthiban V

5 REPLIES
Senior Apprentice

Re: Stats Collection

Hi,

 

There are two aspects to this question.

 

1) "should i have the collect stats executed" so that the UPDATE runs faster?

 

2) "should i have the collect stats executed" so that subsequent processing runs faster?

 

(ok, so yes there's a third - both of the above)

 

For #1 above: what is the WHERE clause for the UPDATE? are any joins involved? Are any indexes involved in access to the table (unlikely given it's an UPDATE and only 19M rows).

- if the answer to all these questions is 'No', then the plan will be a full table scan - in which case stats will make no difference at all to the execution of the UPDATE.

 

For #2 above:

- firstly, IMHO you should have stats on every table, even if it is only SUMMARY stats (TD 14+)

- will the update significantly change the demographics of the updated columns? e.g. are you setting a large percentage of rows to the same value? And will that change adversely affect subsequent queries that access this table?

 

The very last question above is the important one, and is probably also the hardest one to answer.

 

You might want to try a couple of key queries that use this table, with/without collecting stats.

 

HTH

Dave

Ward Analytics Ltd - information in motion
www: http://www.ward-analytics.com
Enthusiast

Re: Stats Collection

Hi Dave,

 

  Answers are inline 

 

For #1 above: what is the WHERE clause for the UPDATE? are any joins involved? Are any indexes involved in access to the table (unlikely given it's an UPDATE and only 19M rows).

  Yes, there is a table join happening but the tables involved have the stats collected

2.will the update significantly change the demographics of the updated columns? e.g. are you setting a large percentage of rows to the same value? And will that change adversely affect subsequent queries that access this table?

   Yes the updated columns largely will have only few values

 

Senior Apprentice

Re: Stats Collection

Hi Parthiban,

 

If the updated columns typically have few values then collecting stats on those columns after the UPDATE is probably a good idea - especially if those columns might be used for selection or joining.

 

For the UPDATE itself then stats on the target table are less likely to have any effect (good or bad). Can you share your SQL and the current explain plan? That might hep provide some better suggestions.

 

Cheers,

Dave

Ward Analytics Ltd - information in motion
www: http://www.ward-analytics.com
Enthusiast

Re: Stats Collection

Hi Dave, 

 

  Yes the update column only has few columns

 

this is a pseudo query

UPDATE TABLE1
SET COLUMN1 = CASE WHEN COLUMN1 = 'X' THEN 'Y'
                           WHEN COLUMN1 = 'XX' THEN 'YY'
                           ELSE COLUMN1
                        END     
WHERE 
TABLE2.KEY_COLUMN = TABLE1.KEY_COLUMN
AND TABLE2.FILE_IND = 'N'

 

Teradata Employee

Re: Stats Collection

Another way to think of this is that if there were statistics on the column before the update and a large update is performed on the column, then statistics should be recollected on the updated column after the update.

 

The stats on other columns, partitions,... are not affected and do not need to be recollected. If stats were not needed on the updated column before, then they do not need to be added.