We have a relatively large datawarehouse (V14.0 Viewpoint V14.10 SLES 10).
We have a problem with certain types of query and certain source tables that causes us tremendous problem with regards system delays.
The queries are characterised by the source tables being quite large in terms of rows (1 billion rows+) and an OLAP fuction being used in the query. The execution plan states that the estimated time qualifies the query to go into the SHORT workload but very quickly the CPU used causes the query to be reclassified first to MEDIUM and then LONG and sits there for over 40 minutes, effectively blocking legitimate LONG queries. Statistics are current and valid.
The other instance where very quick reclassification following a mis-classification happens is when a very wide table (2000 columns) is referenced. We are aware that the design of the table is poor but the same problem occurs.
So my question is. Is there any possible way we can implement an exception/throttle to protect the system from this behaviour?
We are due to move to TD 14.10 or 15.0 in the near future and that will include an upgrade to SLES 11 but in the meantime any sggestions would be welcome.
I was waiting on others to possibly provide some wisdom on this but it appears you have stumped the forum. So here's some general advice and considerations. First recommendation would be to submit a Teradata support incident to allow for true system diagnostics. I think the key here is understanding why the optimizer is declaring these queries as SHORT when in fact they are running much longer. Is there possibly an index issue here? Or is the system possibly saturated with work causing queries to run longer due to CPU exhaustion? You also mention this query then blocks other long running queries. Is that due to locking or workload management throttles? Looking at the Explain, is there a particular step where the estimates are much different than the actuals? Or is each step just taking longer?
Suspect wrong/stale stats and OLAP function is driving through wrong estimates.
an you share rough figure on number of AMP's and the piece of code using OLAP function. Also, the explain plan with "STATFN" key word. On large platforms, OLAP redistribution factor behaves different leading to worst execution plan.
Regarding stats, If you have recently migrated to 14x
1. Revisit all multi-column stats defined on the table
2. Validate stats usage using dbc.statsv and drop unused stats
3. Sometimes it helps to Drop & Collect stats to record right stats histogram
4. Set DBSControl parameter NoDot0Backdown=TRUE to leverage TD 14 new stats enhancements
Regarding the queries with OLAP functions, the classification might be due to the missing estimates for "Stat Function" steps (TD15 adds this). How's the SHORT workload actually defined?