Devising a comprehensive process for "Collect Stats" on the entire EDW

Database
N/A

Devising a comprehensive process for "Collect Stats" on the entire EDW

Hello Friends,

I am struck in a wonderful situation.

Suppose that I have my TD environment consisting of 720 tables. Of these tables, I have 420 tables that are populated on a periodic basis. I want to determine on the tables that I should collect statistics.

I am aware of the rules: 10% change, last stats collected date methods. This is some thing that is run on each table manually. I don't want to do manually as it is highly time-consuming.

What I am exactly looking is...some thing like a process that will run and give me a list of tables upon which I should collect stats.

Friends, I know I am asking for some thing that might not be a one-shot method. Please, suggest me the Key points to remember while doing for a big EDW. I will figure out.

Thanks in advance.
5 REPLIES

Re: Devising a comprehensive process for "Collect Stats" on the entire EDW

Hi,

There is one suggestion, I hope it helps u out.....
It goes like this...

1. Create a Table which captures daily the count, size for all the necessary tables
2. Then u can have the difference in percentage when u compare the difference of 2 days data.
3. you can create a collect stats script with the help of dbc involving this table
4. Schedule ur script for daily run

Regards
JK
Teradata Employee

Re: Devising a comprehensive process for "Collect Stats" on the entire EDW

You can use Statistics Wizard product, specify a workload, or a database, or a selection of tables, it will give you recommendations from which you can then take action.

Re: Devising a comprehensive process for "Collect Stats" on the entire EDW

There is a very standard Qry that you can use for getting the stats

locking row for access
SELECT /*** 64-bit ***/
DatabaseName,
TableName,
ColumnName,

/** Number of columns within multi-column or index stats **/
ColumnCount,

/** stats collected on:
'C' --> Column
'I' --> Index
'M' --> Multiple columns (V2R5+)
'D' --> Pseudo column PARTITION (V2R6.1+)
**/
StatsType,

/** collect stats date **/
CollectDate (DATE),

/** collect stats time **/
CollectTime (TIME(2)),

CollectTimestamp (TIMESTAMP(2)),

/** V2R5: sample size used for collect stats, NULL if not sampled **/
SampleSize,

/** Version
1: pre-V2R5
2: V2R5+
3: TD12
**/
StatsVersion,

/** TD12: Number of AMPs on the system **/
NumAMPs,

/** Number of intervals **/
NumIntervals,

/** TD12: All-AMPs average of the average number of rows per NUSI value
per individual AMP, Estimated WHEN Sampled **/
AvgAmpRPV,

/** Row Count, Estimated when Sampled **/
NumRows (DECIMAL(18,0)),

/** Distinct Values, Estimated when Sampled **/
NumValues (DECIMAL(18,0)),

/** Number of partly null and all null rows,
Estimated WHEN Sampled **/
NumNulls (DECIMAL(18,0)),

/** TD12: Number of all null rows in the column or index set,
Estimated WHEN Sampled **/
NumAllNulls (DECIMAL(18,0)),

/** Maximum number of rows / value, Estimated when Sampled **/
ModeFreq (DECIMAL(18,0))
FROM
(
SELECT
DatabaseName,
TableName,
ColumnName,
ColumnCount,
StatsType,
SampleSize,
StatsVersion,

(
(HASHBUCKET
(SUBSTR(CollectDate_, 2, 1) ||
SUBSTR(CollectDate_, 1, 1) (BYTE(4))
) / TD12 - 1900
) * 10000
+
(HASHBUCKET
('00'xb || SUBSTR(CollectDate_, 3, 1) (BYTE(4))
) / TD12
) * 100
+
(HASHBUCKET
(
'00'xb || SUBSTR(CollectDate_, 4, 1) (BYTE(4))
) / TD12
) (DATE, FORMAT 'yyyy-mm-ddB')
) AS CollectDate,

(
(HASHBUCKET
(CAST('00'xb || SUBSTR(CollectTime_, 1, 1) AS BYTE(4))
) / TD12 (FORMAT '99:')
) ||
(HASHBUCKET
(CAST('00'xb || SUBSTR(CollectTime_, 2, 1) AS BYTE(4))
) / TD12 (FORMAT '99:')
) ||
(HASHBUCKET
(CAST('00'xb || SUBSTR(CollectTime_, 3, 1) AS BYTE(4))
) / TD12 (FORMAT '99.')
) ||
(HASHBUCKET
(CAST('00'xb || SUBSTR(CollectTime_, 4, 1) AS BYTE(4))
) / TD12 (FORMAT '99')
) (TIME(2), FORMAT 'hh:mi:ss.s(2)')
) AS CollectTime,

(CollectDate || (CollectTime (CHAR(11))))
(TIMESTAMP(2), FORMAT 'yyyy-mm-ddBhh:mi:ss.s(2)') AS CollectTimestamp,

HASHBUCKET(SUBSTR(NumNulls_, 8, 1)
|| SUBSTR(NumNulls_, 7, 1) (BYTE(4))) / TD12 AS NumNullsw1,
HASHBUCKET(SUBSTR(NumNulls_, 6, 1)
|| SUBSTR(NumNulls_, 5, 1) (BYTE(4))) / TD12 AS NumNullsw2,
HASHBUCKET(SUBSTR(NumNulls_, 4, 1)
|| SUBSTR(NumNulls_, 3, 1) (BYTE(4))) / TD12 AS NumNullsw3,
HASHBUCKET(SUBSTR(NumNulls_, 2, 1)
|| SUBSTR(NumNulls_, 1, 1) (BYTE(4))) / TD12 AS NumNullsw4,

CASE WHEN NumNulls_ = '00'xb THEN 0
ELSE
(-1**(NumNullsw1 / 32768))
* (2**((NumNullsw1/16 MOD 2048) - 1023))
* (1 + ((NumNullsw1 MOD 16) * 2**-4) + (NumNullsw2 * 2**-20)
+ (NumNullsw3 * 2**-36) + (NumNullsw4 * 2**-52))
END AS NumNulls,

HASHBUCKET(SUBSTR(NumIntervals_, 2, 1)
|| SUBSTR(NumIntervals_, 1, 1) (BYTE(4))) / TD12 AS NumIntervals,

HASHBUCKET(SUBSTR(NumAllNulls_, 8, 1)
|| SUBSTR(NumAllNulls_, 7, 1) (BYTE(4))) / TD12 AS NumAllNullsw1,
HASHBUCKET(SUBSTR(NumAllNulls_, 6, 1)
|| SUBSTR(NumAllNulls_, 5, 1) (BYTE(4))) / TD12 AS NumAllNullsw2,
HASHBUCKET(SUBSTR(NumAllNulls_, 4, 1)
|| SUBSTR(NumAllNulls_, 3, 1) (BYTE(4))) / TD12 AS NumAllNullsw3,
HASHBUCKET(SUBSTR(NumAllNulls_, 2, 1)
|| SUBSTR(NumAllNulls_, 1, 1) (BYTE(4))) / TD12 AS NumAllNullsw4,

CASE WHEN NumAllNulls_ = '00'xb THEN 0
ELSE
(-1**(NumAllNullsw1 / 32768))
* (2**((NumAllNullsw1/16 MOD 2048) - 1023))
* (1 + ((NumAllNullsw1 MOD 16) * 2**-4) + (NumAllNullsw2 * 2**-20)
+ (NumAllNullsw3 * 2**-36) + (NumAllNullsw4 * 2**-52))
END AS NumAllNulls,

HASHBUCKET(SUBSTR(AvgAmpRPV_, 8, 1)
|| SUBSTR(AvgAmpRPV_, 7, 1) (BYTE(4))) / TD12 AS AvgAmpRPVw1,
HASHBUCKET(SUBSTR(AvgAmpRPV_, 6, 1)
|| SUBSTR(AvgAmpRPV_, 5, 1) (BYTE(4))) / TD12 AS AvgAmpRPVw2,
HASHBUCKET(SUBSTR(AvgAmpRPV_, 4, 1)
|| SUBSTR(AvgAmpRPV_, 3, 1) (BYTE(4))) / TD12 AS AvgAmpRPVw3,
HASHBUCKET(SUBSTR(AvgAmpRPV_, 2, 1)
|| SUBSTR(AvgAmpRPV_, 1, 1) (BYTE(4))) / TD12 AS AvgAmpRPVw4,

CASE WHEN AvgAmpRPV_ = '00'xb THEN 0
ELSE
(-1**(AvgAmpRPVw1 / 32768))
* (2**((AvgAmpRPVw1/16 MOD 2048) - 1023))
* (1 + ((AvgAmpRPVw1 MOD 16) * 2**-4) + (AvgAmpRPVw2 * 2**-20)
+ (AvgAmpRPVw3 * 2**-36) + (AvgAmpRPVw4 * 2**-52))
END AS AvgAmpRPV,

HASHBUCKET(SUBSTR(NumAMPs_, 2, 1)
|| SUBSTR(NumAMPs_, 1, 1) (BYTE(4))) / TD12
AS NumAMPs,

HASHBUCKET(SUBSTR(ModeFreq_, 8, 1)
|| SUBSTR(ModeFreq_, 7, 1) (BYTE(4))) / TD12 AS ModeFreqw1,
HASHBUCKET(SUBSTR(ModeFreq_, 6, 1)
|| SUBSTR(ModeFreq_, 5, 1) (BYTE(4))) / TD12 AS ModeFreqw2,
HASHBUCKET(SUBSTR(ModeFreq_, 4, 1)
|| SUBSTR(ModeFreq_, 3, 1) (BYTE(4))) / TD12 AS ModeFreqw3,
HASHBUCKET(SUBSTR(ModeFreq_, 2, 1)
|| SUBSTR(ModeFreq_, 1, 1) (BYTE(4))) / TD12 AS ModeFreqw4,

CASE WHEN ModeFreq_ = '00'xb THEN 0
ELSE
(-1**(ModeFreqw1 / 32768))
* (2**((ModeFreqw1/16 MOD 2048) - 1023))
* (1 + ((ModeFreqw1 MOD 16) * 2**-4) + (ModeFreqw2 * 2**-20)
+ (ModeFreqw3 * 2**-36) + (ModeFreqw4 * 2**-52))
END AS ModeFreq,

HASHBUCKET(SUBSTR(NumValues_, 8, 1)
|| SUBSTR(NumValues_, 7, 1) (BYTE(4))) / TD12 AS NumValuesw1,
HASHBUCKET(SUBSTR(NumValues_, 6, 1)
|| SUBSTR(NumValues_, 5, 1) (BYTE(4))) / TD12 AS NumValuesw2,
HASHBUCKET(SUBSTR(NumValues_, 4, 1)
|| SUBSTR(NumValues_, 3, 1) (BYTE(4))) / TD12 AS NumValuesw3,
HASHBUCKET(SUBSTR(NumValues_, 2, 1)
|| SUBSTR(NumValues_, 1, 1) (BYTE(4))) / TD12 AS NumValuesw4,

CASE WHEN NumValues_ = '00'xb THEN 0
ELSE
(-1**(NumValuesw1 / 32768))
* (2**((NumValuesw1/16 MOD 2048) - 1023))
* (1 + ((NumValuesw1 MOD 16) * 2**-4) + (NumValuesw2 * 2**-20)
+ (NumValuesw3 * 2**-36) + (NumValuesw4 * 2**-52))
END AS NumValues,

HASHBUCKET(SUBSTR(NumRows_, 8, 1)
|| SUBSTR(NumRows_, 7, 1) (BYTE(4))) / TD12 AS NumRowsw1,
HASHBUCKET(SUBSTR(NumRows_, 6, 1)
|| SUBSTR(NumRows_, 5, 1) (BYTE(4))) / TD12 AS NumRowsw2,
HASHBUCKET(SUBSTR(NumRows_, 4, 1)
|| SUBSTR(NumRows_, 3, 1) (BYTE(4))) / TD12 AS NumRowsw3,
HASHBUCKET(SUBSTR(NumRows_, 2, 1)
|| SUBSTR(NumRows_, 1, 1) (BYTE(4))) / TD12 AS NumRowsw4,

CASE WHEN NumRows_ = '00'xb THEN 0
ELSE
(-1**(NumRowsw1 / 32768))
* (2**((NumRowsw1/16 MOD 2048) - 1023))
* (1 + ((NumRowsw1 MOD 16) * 2**-4) + (NumRowsw2 * 2**-20)
+ (NumRowsw3 * 2**-36) + (NumRowsw4 * 2**-52))
END AS NumRows

FROM
(
SELECT
DatabaseName,
TableName,
ColumnName,
ColumnCount,
StatsType,

/** TD12 changed the HASHBUCKET function (16 bit vs. 20 bit),
on TD12 (using 20 bits for HashBuckets) the result must be divided by 16 **/
((HASHBUCKET()+1)/65536) AS TD12,

SUBSTR(STATS, 1, 4) AS CollectDate_,

SUBSTR(STATS, 5, 4) AS CollectTime_,

POSITION(SUBSTR(STATS, 9, 1) IN '010203'xb) AS StatsVersion,

CASE
WHEN HASHBUCKET ('00'xb || SUBSTR(STATS, 11, 1) (BYTE(4))) / TD12 = 1
THEN HASHBUCKET ('00'xb || SUBSTR(STATS, 12, 1) (BYTE(4))) / TD12
END AS SampleSize,

SUBSTR(STATS, 13 + 4, 8) AS NumNulls_,

SUBSTR(STATS, 21 + 4, 2) AS NumIntervals_,

CASE
WHEN StatsVersion = 3 THEN SUBSTR(STATS, 25 + 8, 8)
END AS NumAllNulls_,

CASE
WHEN StatsVersion = 3 THEN SUBSTR(STATS, 33 + 8 , 8)
END AS AvgAmpRPV_,

CASE
WHEN StatsVersion = 3 THEN SUBSTR(STATS, 57 + 8, 2)
END AS NumAMPs_,

CASE
WHEN SUBSTR(STATS, 23 + 4, 1) = '01'xb THEN 16
ELSE 32
END AS Offset,

CASE
WHEN StatsVersion < 3 THEN SUBSTR(STATS, 33 + Offset, 8)
WHEN StatsVersion = 3 THEN SUBSTR(STATS, 73 + Offset, 8)
END AS ModeFreq_,

CASE
WHEN StatsVersion < 3 THEN SUBSTR(STATS, 33 + Offset + 8, 8)
WHEN StatsVersion = 3 THEN SUBSTR(STATS, 73 + Offset + 8, 8)
END AS NumValues_,

CASE
WHEN StatsVersion < 3 THEN SUBSTR(STATS, 33 + Offset + 16, 8)
WHEN StatsVersion = 3 THEN SUBSTR(STATS, 73 + Offset + 16, 8)
END AS NumRows_

FROM
(
SELECT
DatabaseName,
TableName,

MAX(CASE WHEN ColumnPosition = 1 THEN TRIM(ColumnName) ELSE '' END) ||
MAX(CASE WHEN ColumnPosition = 2 THEN ',' || TRIM(ColumnName) ELSE '' END) ||
MAX(CASE WHEN ColumnPosition = 3 THEN ',' || TRIM(ColumnName) ELSE '' END) ||
MAX(CASE WHEN ColumnPosition = 4 THEN ',' || TRIM(ColumnName) ELSE '' END) ||
MAX(CASE WHEN ColumnPosition = 5 THEN ',' || TRIM(ColumnName) ELSE '' END) ||
MAX(CASE WHEN ColumnPosition = 6 THEN ',' || TRIM(ColumnName) ELSE '' END) ||
MAX(CASE WHEN ColumnPosition = 7 THEN ',' || TRIM(ColumnName) ELSE '' END) ||
MAX(CASE WHEN ColumnPosition = 8 THEN ',' || TRIM(ColumnName) ELSE '' END) ||
MAX(CASE WHEN ColumnPosition = 9 THEN ',' || TRIM(ColumnName) ELSE '' END) ||
MAX(CASE WHEN ColumnPosition = 10 THEN ',' || TRIM(ColumnName) ELSE '' END) ||
MAX(CASE WHEN ColumnPosition = 11 THEN ',' || TRIM(ColumnName) ELSE '' END) ||
MAX(CASE WHEN ColumnPosition = 12 THEN ',' || TRIM(ColumnName) ELSE '' END) ||
MAX(CASE WHEN ColumnPosition = 13 THEN ',' || TRIM(ColumnName) ELSE '' END) ||
MAX(CASE WHEN ColumnPosition = 14 THEN ',' || TRIM(ColumnName) ELSE '' END) ||
MAX(CASE WHEN ColumnPosition = 15 THEN ',' || TRIM(ColumnName) ELSE '' END) ||
MAX(CASE WHEN ColumnPosition = 16 THEN ',' || TRIM(ColumnName) ELSE '' END) ||
MAX(CASE WHEN ColumnPosition > 16 THEN ',...' ELSE '' END) AS ColumnName,

COUNT(*) AS ColumnCount,

'I' AS StatsType,

MAX(SUBSTR(IndexStatistics, 1, 128)) AS STATS

FROM
dbc.IndexStats
GROUP BY
DatabaseName,
TableName,
StatsType,
IndexNumber
HAVING STATS IS NOT NULL

/** Remove for pre-V2R5 --> **/
UNION ALL

SELECT
DatabaseName,
TableName,
MAX(CASE WHEN ColumnPosition = 1 THEN TRIM(ColumnName) ELSE '' END) ||
MAX(CASE WHEN ColumnPosition = 2 THEN ',' || TRIM(ColumnName) ELSE '' END) ||
MAX(CASE WHEN ColumnPosition = 3 THEN ',' || TRIM(ColumnName) ELSE '' END) ||
MAX(CASE WHEN ColumnPosition = 4 THEN ',' || TRIM(ColumnName) ELSE '' END) ||
MAX(CASE WHEN ColumnPosition = 5 THEN ',' || TRIM(ColumnName) ELSE '' END) ||
MAX(CASE WHEN ColumnPosition = 6 THEN ',' || TRIM(ColumnName) ELSE '' END) ||
MAX(CASE WHEN ColumnPosition = 7 THEN ',' || TRIM(ColumnName) ELSE '' END) ||
MAX(CASE WHEN ColumnPosition = 8 THEN ',' || TRIM(ColumnName) ELSE '' END) ||
MAX(CASE WHEN ColumnPosition = 9 THEN ',' || TRIM(ColumnName) ELSE '' END) ||
MAX(CASE WHEN ColumnPosition = 10 THEN ',' || TRIM(ColumnName) ELSE '' END) ||
MAX(CASE WHEN ColumnPosition = 11 THEN ',' || TRIM(ColumnName) ELSE '' END) ||
MAX(CASE WHEN ColumnPosition = 12 THEN ',' || TRIM(ColumnName) ELSE '' END) ||
MAX(CASE WHEN ColumnPosition = 13 THEN ',' || TRIM(ColumnName) ELSE '' END) ||
MAX(CASE WHEN ColumnPosition = 14 THEN ',' || TRIM(ColumnName) ELSE '' END) ||
MAX(CASE WHEN ColumnPosition = 15 THEN ',' || TRIM(ColumnName) ELSE '' END) ||
MAX(CASE WHEN ColumnPosition = 16 THEN ',' || TRIM(ColumnName) ELSE '' END) ||
MAX(CASE WHEN ColumnPosition > 16 THEN ',...' ELSE '' END) AS ColumnName,

COUNT(*) AS ColumnCount,

MAX(CASE WHEN StatisticsId = 129 AND ColumnName = 'PARTITION' THEN 'D' ELSE 'M' END) AS StatsType,

MAX(SUBSTR(ColumnsStatistics, 1, 128)) AS STATS

FROM
dbc.MultiColumnStats
GROUP BY
DatabaseName,
TableName,
StatisticsID
HAVING STATS IS NOT NULL
/** <-- Remove for pre-V2R5 **/

UNION ALL

SELECT
DatabaseName,
TableName,
ColumnName,
1 AS ColumnCount,
'C' AS StatsType,
SUBSTR(fieldstatistics, 1, 128) AS STATS

FROM
dbc.ColumnStats
WHERE STATS IS NOT NULL
) dt
) dt
) dt
;

Use this Qry & check for the number of records. Then do a count(*) on the table check the numbers. That may give you a good idea on which you need to collect stats.
Teradata Employee

Re: Devising a comprehensive process for "Collect Stats" on the entire EDW

Has someone updated these queries for Teradata 13.10?

Giovanni

N/A

Re: Devising a comprehensive process for "Collect Stats" on the entire EDW