When you start designing a statistics collection process topics include:
Both problems involve some kind of prioritization process, which finally leads to a sorted list of COLLECT STATS statement (See Marcio Moura's blog When is the right time to refresh statistics?)
Processing this list (using a BTEQ script or a cursor in a Stored Procedure) could lead to another problem:
Therefore you might have to stop the process before all those stats are collected.
But now you have to take care that the remaining stats will be collected during the next run (and probably with a higher priority).
Queue tables have been implemented in V2R6, but are still one of Teradata's hardly known features. Their main purpose is to allow asynchronous event processing in a FIFO and this perfectly matches with our requirements:
As an additional benefit ETL-jobs requiring some final COLLECT STATS steps might finish earlier, when those statements can run asynchronous via queue.
The attached script creates this queue table (CollectStatsQueue) plus two log tables (CollectStatsLog and CollectStatsLogDetails), a Stored Procedure (CollectStatsProc) to process the queue and a macro (StopQueue) to stop the processing. Syntax is based on a TD13.0 running in Teradata mode, older releases would require some modifications.
This is the main driver table for the collection process, a COLLECT STATS is triggered for each inserted row.
When your current process results in an ordered list this order of execution is maintained by setting the queue's "Query Insertion Timestamp" column to
|Optional, when supplied used instead of the column list|
Comma-delimited list of column names (for column and index level stats)
Collect on T (Table), C (Column), I (Index) level
|uses SAMPLE when set to any non-NULL value|
Records info about every run of the CollectStatsProc including the number of statistics collected and the number of errors during the collect.
Records info for every failing (optionally also successful) COLLECT STATS.
This Stored Procedure actually submits the COLLECT STATS for each row in the queue
Process will stop when a COLLECT finishes after MaxTS or all rows in queue are processed.
If MaxTS is set to NULL the process will run until the StopQueue macro is executed.
If set to 'Y' info about each processed row is written to CollectStatsLogDetails, otherwise only failed collects.
Update the CollectStatsLog table after x minutes with info about the number of statistics successfully collected or failed.
SQL Errorcode: when there's any error outside of the actual COLLECT statement the process stops
Result of the execution: Success/Warning/Failed plus additional info.
Macro to stop all SPs processing the queue: invoke
EXEC StopQueue; from any session.
CALL CollectStatsProc(CURRENT_TIMESTAMP(2)+INTERVAL '1' HOUR,'Y',5, rc, msg);
Run for up to one hour, log all rows in the CollectStatsLogDetails table and update the CollectStatsLog table every 5 minutes.
Can be stopped earlier by simply running
EXEC StopQueue; from any session.
CALL CollectStatsProc(NULL,'N',10, rc, msg);
Run until StopQueue is executed, log only failed collects in the CollectStatsLog table and update the CollectStatsLog table every 10 minutes.
As the CollectStatsProc will probably be CALLed in a BTEQ script it's easy to run multiple copies:
.SET SESSIONS 2
Some remarks on running multiple instances of CollectStatsProc:
The following example to recollect all existing stats within a database is based on my StatsInfo view:
INSERT INTO CollectStatsQueue
( QITS, DatabaseName, TableName, IndexName, ColumnName, CollectLevel, SampleSize)
+ (ROW_NUMBER() OVER (ORDER BY databasename, tablename) * INTERVAL '0.000001' SECOND)
,CASE WHEN CollectLevel = 'T' THEN '' ELSE IndexName END AS IndexName
,CASE WHEN CollectLevel = 'T' THEN '' ELSE ColumnName END AS ColumnName
,CASE -- collect on table level for small tables
WHEN COUNT(*) OVER (PARTITION BY TableId) > 1
AND (NumRows < 100000
OR SUM(CollectDuration) OVER (PARTITION BY TableId) < INTERVAL '0:10' MINUTE TO SECOND)
WHEN StatsType IN ('Col', 'MCol', 'Part') THEN 'C'
END AS CollectLevel
WHERE TableType <> 'TempTbl'
AND MissingStats = 'N'
AND DatabaseName = ...
) AS dt;
CALL CollectStatsProc(CURRENT_TIMESTAMP(2)+INTERVAL '10' MINUTE,'Y',1, rc, msg);
SELECT * FROM CollectStatsLog ORDER BY StartTS;
SELECT * FROM CollectStatsLogDetails ORDER BY StartTS;
Please provide any kind of feedback (especially about performance when running multiple instances).
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.