Out of spool with TD version 15.10

Database
New Member

Out of spool with TD version 15.10

In our TD system, we ran out of spool multiple times in the past. In one of the incident, one of the developer did some analysis and here is the background.

  1. The query ran out of spool
  2. After sometime, the query was resubmitted and finished in 2 minutes.
  3. By analizing the plan, we noticed that the first exuection was trying to distribute the big table by hash key.
  4. The second execution was replicating the small table. In between the two execution, the stats for big table was recollected by the system.

Based on the observation above, the developer thinks that the out-of-spool issue was caused by the fact that stats on big table is stale and due to which the execution plan was not optimal. We reached ou to the DBA and here is the response from DBA.

 

<Quote>

Note that since TD 14.10, Teradata has been increasingly opting for which stats should be re-collected when a COLLECT STATS statement is issued. This will, ultimately, fully replace the need to re-collect stats on target tables. For now, Teradata will not refresh stats on columns / tables which it considers to not have sufficiently changed content. HELP STATS is not a sufficient indicator of whether stats are stale or not. You will see "old" dates when content remains static - note the dates and times in the scr-cap below.

 

Teradata now performs random-AMP sampling of data-content while creating execution plans. If the new histogram does not match the existing histogram of pre-collected stats, it will use the information it received from the random-AMP sample instead, in the execution plan.

 

Keep in mind that some of our tables are HUGE and collecting stats on them takes an hour or two. Such tables have stats collected weekly. Please do not request that such tables should have their stats recollected, because you think that is a quick-fix solution for spool issues. The process is extremely resource intensive and can severely impact ETL or user SLAs. We had one such incident recently.

</Quote>

 

The DBA also mentioned that 

<Quote>

spool errors are always a result of badly designed SQL or attempting to work with extremely large datasets. Either of these two can result in skewed content which will always manifest as spool issues.

</Quote>

 

In a nutshell, the DBA's opinion is that the out-of-spool is always caused by bad-query and there is nothing to do with stats being stale, espedcially with TD 15.10 where the TD will intelligently collect stats automatically. 

 

I want to hear your input on this.

Jim

2 REPLIES
Senior Apprentice

Re: Out of spool with TD version 15.10

Hi Jim,

 

Not sure I agree with everything that has been said above.

 

In general I have a real problem with any claim that something (good or bad) is always caused by 'x'. Or that something can always be fixed by 'y'.

 

Some of the specific comments that I'd like to address:

"spool errors are always a result of badly designed SQL" - sorry but I don't agree. There are cases where spool issues are caused by (apallingly) bad sql, but in my experience a lot of them are caused by missing or stale stats.

 

"attempting to work with extremely large datasets." - this is what Teradata was designed for, from day 1, from the ground up. At the end of the day, if the business requirement is to process 2 years of transactions then that is what the SQL has to do. Sure, using indexes and selection criteria or join conditions you want to avoid processing 5 years of data, but you still have to process 2 years.

 

"Either of these two can result in skewed content which will always manifest as spool issues." - more accurate. Skewed content often causes spool issues.

 

Statistics:

"Teradata now performs random-AMP sampling of data-content while creating execution plans. If the new histogram does not match the existing histogram of pre-collected stats, it will use the information it received from the random-AMP sample instead, in the execution plan." This has been happening since TD12 and (I'm fairly certain) that the random amp samples (RAS) are used to modify any collected statistics. Note that any 'stats' of this kind are not stored permanently, they are simply held in the stats cache and will eventually be purged.

 

" For now, Teradata will not refresh stats on columns / tables which it considers to not have sufficiently changed content." (Now to be really picky...) The dbms does not collect/refresh statistics automatically - except for the use of RAS as described above. In order for permanently stored stats to be refreshed a 'collect statistics' command must be run.

 

The closest you'll get to 'automatic collection of stats' is via the AutoStats feature and that is really an external application (Viewpoint) issuing commands against the dbms.

 

With TD14.x there is now 'threshold' option which allows you to run the same 'collect stats' command multiple times, but until the threshold is reached detailed stats will not be collected, henece no overhead for this processing. Even if detailed stats are not collected the summary stats are always collected (and I do mean 'always' Smiley Happy).

 

"Keep in mind that some of our tables are HUGE and collecting stats on them takes an hour or two. ... The process is extremely resource intensive and can severely impact ETL or user SLAs" - agreed. It can be, which is why you do need to be careful how frequently/when you collect stats on the large tables. Also think about using the SAMPLE option on such tables. This can help to reduce the statistics 'overhead' significantly.

 

In your initial problem description you descibed a case where (presumably) the same query was run against the same data but generated two different plans. The difference being the presence of new/more recent stats on one or more of the tables.

    This is a classic case of when statistics should be collected/refreshed. To change/improve the structure of the plan. This is why you collect statistics in the first place. Unless statistics change the structure of the plan then they have no use, no purpose and collecting them is a complete waste of time, effort and resources.

 

Does that help?

 

Cheers,

Dave

Ward Analytics Ltd - information in motion
www: http://www.ward-analytics.com
Junior Contributor

Re: Out of spool with TD version 15.10

Full Ack.

In addition to Dave's excellent comments :-)

 

Note that since TD 14.10, Teradata has been increasingly opting for which stats should be re-collected when a COLLECT STATS statement is issued. This will, ultimately, fully replace the need to re-collect stats on target tables. For now, Teradata will not refresh stats on columns / tables which it considers to not have sufficiently changed content.

To be able to determine if stats need to be refreshed a COLLECT STATS must be submitted and the DBQL USECOUNT for that database must be enabled.

 

HELP STATS is not a sufficient indicator of whether stats are stale or not. You will see "old" dates when content remains static

HELP CURRENT STATS returns the extrapolated stats, thus can be used to compare to the "old" stats (similar for SHOW CURRENT STATS).

 

 

Teradata now performs random-AMP sampling of data-content while creating execution plans. If the new histogram does not match the existing histogram of pre-collected stats, it will use the information it received from the random-AMP sample instead, in the execution plan.

No, RAS doesn't sample data-content, but row count. When it doesn't match the number stored in stats the optimizer tries to extrapolate, which simply adjusts the estimated number of rows per value or extends the range of values (for date/timestamp columns) when rows have been added. When rows have been added and deleted and updated this extrapolation might be totally wrong. See SHOW CURRENT STATS for the actual estimated values.

 

Keep in mind that some of our tables are HUGE and collecting stats on them takes an hour or two. Such tables have stats collected weekly.

Columns which are known to cause problems should be collected more often, the optimizer will skip it when there's not enough change. And there's the SAMPLE option, but better use SAMPLE n PERCENT instead of SYSTEM SAMPLE, I like to force the sample because I'm smarter than the optimizer, at least sometimes :-)