New StatsInfo query for TD14

Blog
The best minds from Teradata, our partners, and customers blog about whatever takes their fancy.
N/A

Beginning with TD14 statistics are no longer stored in dbc.TVFields and dbc.Indexes, they have been moved into dbc.StatsTbl to facilitate several enhancements. A new view dbc.StatsV returns much of the information previously extracted in my StatsInfo query.

But of course this is still not enough information, at least not for me ;-)

 dbc.StatsV vs. StatsInfo TD13/14

I tried to get the best of both worlds and so i wrote a new version for TD14 to extract as much additional data as possible. I had to remove a few columns (they're no longer needed or it's no longer possible to get that info) and i renamed some to match the new names in dbc.StatsV. Most of the new columns were simply not available before TD14.

Following table describes the new StatsInfo view and details the differences to the previous version and dbc.StatsV:

dbc.StatsV TD14

New StatsInfo TD14

StatsInfo TD13.10

Remarks

Description

 

(Column added/removed/changed)

 

DatabaseName

DatabaseName

DatabaseName

TableName

TableName

TableName

   

ColumnName

ColumnName

ColumnName

StatsId <> 0

List of comma-separated column names

FieldIdList

FieldIdList

FieldId

StatsId <> 0

List of semicolon-separated field ids

StatsName

StatsName

StatsName

StatsId <> 0

Alias name of the statistics (if assigned)

IndexName

IndexName

StatsId <> 0

Name of the index (if assigned)

DateIncluded

DateIncluded

StatsId <> 0

DATE or TIMESTAMP column included, Y/N

PartitionColumn

PartitionColumn

StatsId <> 0

Column included which is used in the table's partitioning expression: Y/N

PartitionLevels

PartitionLevels

Number of levels in the table's partitioning expression, zero means not partitioned

ColumnPartitioningLevel

Level number for the column partitioning level, > 0 indicates columnar table

PartitionsDefined

PartitionsDefined

The number of partitions defined

ExpressionCount

ExpressionCount

ColumnCount

StatsId <> 0

The number of columns in the statistics

StatsId

StatsId

StatsId = 0 → Summary Stats

StatsType

StatsType

Statistics collected on:
T → Table
I → Join Index
N → Hash Index
V → View 14.10?
Q → Query 14.10?
L → Link Row 14.10?

StatsTypeOld

StatsType

Statistics collected on:
Summ → Summary Statistic
UPI → Unique Primary Index
NUPI → Non-Unique Primary Index
USI → Unique Secundary Index
NUSI → Non-Unique Secondary Index
VOSI → Value Ordered NUSI
Part → Pseudo column PARTITION
Col → Single Column
MCol → Multiple columns

TableType

TableType

TempTbl → Global Temporary Table
Tbl → Table
JoinIdx → Join Index
HashIdx → Hash Index
NoPITbl → No Primary Index Table

StatsSource

StatsSource

The method this statistic is acquired:
I → Internally generated
S → User collected with COLLECT STATS (system built)
U → User collected with COLLECT STATS VALUES clause
C → Copied from other sources
T → Transferred with CREATE TABLE...AS statement

ValidStats

ValidStats

TD14.10: Indicates whether the statistics are valid or not: Y/N

DBSVersion

DBSVersion

Database version statistics collected on

SampleSizePct

SampleSizePct

SampleSize

StatsId <> 0

Sample size used for collect stats, NULL if not sampled

SampleSignature

SampleSignature

StatsId <> 0

Sample option encoded as a 10 character signature
USPnone → collected using NO SAMPLE
USP00nn.00 → collected using SAMPLE nn PERCENT
SDPxxxx.xx → sample size determined by system

ThresholdSignature

ThresholdSignature

StatsId <> 0

THRESHOLD options encoded as a 17 character signature (not used before TD14.10)
Characters 1 to 10 → THRESHOLD PERCENT
SCTxxxx.xx → System defined
UCT005.00 → User defined 5 percent
UCTnone → User defined no threshold
Characters 11 to 17 → THRESHOLD DAYS
STTxxxx → System defined
UTT0010 → User defined 10 days
UTTnone → User defined no threshold

MaxIntervals

MaxIntervals

StatsId <> 0

User-specified maximum number of intervals

StatsSkipCount

StatsSkipCount

StatsId <> 0

TD14.10 only: How many times the statistis collection has been skipped based on the THRESHOLD

MaxValueLength

MaxValueLength

StatsId <> 0

User-specified maximum value length

LastCollectTimestamp

LastCollectTimestamp

CollectTimestamp

Date and time when statistics were last collected

LastCollectDate

CollectDate

LastCollectTime

CollectTime

RowCount

RowCount

NumRows

The cardinality of the table, i.e. the number of rows

UniqueValueCount

UniqueValueCount

NumValues

StatsId <> 0

Distinct Values. Estimated when sampled

PNullUniqueValueCount

PNULLUniqueValueCount

StatsId <> 0

Number of unique values from rows with partial NULLs (multicolumn stats)
Estimated when sampled

NullCount

NULLCount

NumNULLs

StatsId <> 0

Number of partly NULL and all NULL rows, estimated when sampled

AllNullCount

AllNULLCount

NumAllNULLs

StatsId <> 0

Number of all NULL rows (multicolumn stats), estimated when sampled

HighModeFreq

HighModeFreq

ModeFreq

StatsId <> 0

Frequency of the most common value, estimated when sampled

PNullHighModeFreq

PNULLHighModeFreq

StatsId <> 0

Highest frequency of values having partial NULLs (for multicolumn stats), stimated when sampled

AvgAmpRPV

AvgAmpRPV

AvgAmpRPV

StatsId <> 0

Overall average of the average rows per value from each AMP, only for NUSIs, otherwise zero

MinValue

MinValue

StatsId <> 0

Minimum data value (only for single column numeric or datetime stats)

ModalValue

ModalValue

StatsId <> 0

Most common data value (only for single column numeric or datetime stats)

MaxValue

MaxValue

StatsId <> 0

Maximum data value (only for single column numeric or datetime stats)

OneAMPSampleEst

OneAMPSampleEst

StatsId = 0

Estimated cardinality based on a single-AMP sample

AllAmpSampleEst

AllAmpSampleEst

StatsId = 0

Estimated cardinality based on an all-AMP sample

DelRowCount

DelRowCount

StatsId = 0

Deleted rows count??? used in 14.10 ???

PhyRowCount

PhyRowCount

StatsId = 0

Seems to be the same as AllAMPSampleEst – used in 14.10 ???

AvgRowsPerBlock

AvgRowsPerBlock

StatsId = 0

Average number of rows per datablock???

AvgBlockSize

AvgBlockSize

StatsId = 0

Average datablock size???

BLCPctCompressed

BLCPctCompressed

StatsId = 0

Blockcompression in percent??? used in 14.10 ???

BLCBlkUcpuCost

BLCBlkUcpuCost

StatsId = 0

CPU cost for Blockcompression??? used in 14.10 ???

BLCBlkURatio

BLCBlkURatio

StatsId = 0

??? used in 14.10 ???

AvgRowSize

AvgRowSize

StatsId = 0

Average record size???

Temperature

Temperature

StatsId = 0

populated in 14.10???

NumOfAMPs

NumOfAMPs

NumAMPs

The number of AMPs from which statistics were collected, usually the number of AMPs in the system, 1 for an empty table

CreateTimeStamp

CreateTimeStamp

Statistics creation timestamp

LastAlterTimeStamp

LastAlterTimeStamp

LastAlterTimeStamp

Different meaningLast user updated timestamp, i.e. Collect stats was submitted but skipped by optimizer due to threshold not reached

LastAccessTimestamp

LastAccessTimestamp

The last time this column/index was used in queries, the same info is found in dbc.TablesV and dbc.IndicesV

AccessCount

AccessCount

How often this column/index was used in queries, the same info is found in dbc.TablesV and dbc.IndicesV

TableId

TableId

To facilitate additional joins to other system tables

IndexNumber

IndexNumber

StatsId <> 0

Index number of the index on which statistics are collected

FieldType

FieldType

Single column stats: dbc.TVFields.FieldType, NULL for multi-column

Version

StatsVersion

Internal version of statistics:
5 → TD14
6 → TD14.10

OriginalVersion

OriginalVersion

StatsId <> 0

Probably version when stats were migrated from older releases, but not yet recollected
4: pre-TD14
5: TD14.00
6: TD14.10

NumOfBiasedValues

NumOfBiasedValues

StatsId <> 0

Number of biased values in the histogram

NumOfEHIntervals

NumOfEHIntervals

StatsId <> 0

Number of equal height intervals in the histogram

NumOfRecords

NumOfRecords

Number of history records in the histogram

CollectStatement

CollectStatement

COLLECT STATS statement to collect the stats.
Two versions with or without double-quoted object names

ShowStatement

HelpStatement

SHOW STATS VALUES statement to get the stats details.
Two versions with or without double-quoted object names.

MissingStats

Was a side-product of the old query, too much overhead to add

NumIntervals

Replaced by NumOfBiasedValues & NumOfEHIntervals

CollectDuration

Not (yet) possible, i don't know if this is stored somewhere

NumericStats

No longer neccessary

DataSize

Too much overhead to calculate, not really needed as the limitation of 16 bytes is removed in TD14

Please report any issues or obviously wrong output to dnoeth@gmx.de.

Attached files: 

StatsInfo_vs_StatsV.pdf

Describes the new StatsInfo view and details the differences to the previous version and dbc.StatsV - added, modified and removed columns (same as above table)

Teradata Statistics TD14.pdf

Partial description of the new internal stats format based on some reverse engeneering of the binary data stored in a BLOB in dbc.StatsTbl.Histogram. Luckily the internal storage maps almost 1:1 to the output of a SHOW STATISTICS VALUES :-)

stats_td14_yyyymmdd.sql

StatsInfo source code. To keep the code clean it's based on SQL-UDFs

ReverseBytes.sql ReverseBytes.c

Can be used to replace the ReverseBytes SQL-UDF with a C-UDF which uses way less CPU (but most DBA's don't like C).
Note: I'm not a C programmer, but this was so basic even I could write it.

Uploaded stats_td14_20140605.txt, fixing two bugs:

2014-02-03 dn fixed a bug resulting in a Syntax error for a „SELECT * FROM StatsInfo14;" in ANSI mode 

2014-06-05 dn fixed a bug returning duplicate rows for tables with additional column constraints

69 Comments

I am on TD 15.0

N/A

@arpitsinha:

What's your Teradata release?

Did you try the latest release of my StatsInfo?

I had this error for ANSI mode, but it was fixed at 2014-02-03.

The CREATE for the reverse_bytes UDF is in "ReverseBytes.sql"

@rs-reply:

Sorry for the late reply, I didn't notice your comment.

This is a strange error, I successfully installed it on multiple 14/14.10/15 systems. 

Can you check if there's another BITAND function on your system? Should be in TD_SYSFNLIB, but there might be an old one in SYSLIB, which comes forst in he search path...

Hi Dieter,

I am on TD 15.00.01_Sles10 VMware.

I have created all the UDF and View but while running select on the view it is showing above mentioned error.

It is in ANSI mode.

N/A

Hi Arpit,

if this is a TD Express you should download a newer release

N/A

Is or will there be a TD 15.0 version of StatsInfo query?

N/A

It's still working on TD15 (didn't check on 15.10) and I didn't notice any new info.

There's only one thing: the AccessCount no longer works (always NULL) since TD14.10, because of the new Object Use Count feature, but adding the new columns would be too complicated (and you might join to those some of those ...UseCountV views).

Following queries return the bytes I'm not decoding, because they used to be binary zeroes up to 15.0.3.

If you get any other values please contact me (you know my mail :-)

select distinct
substr(histogram,9,2*8) (byte(16)),
substr(histogram,129,3*8) (byte(24))
from dbc.statstbl
where statsid = 0;

select distinct
substr(histogram,21,4) (byte(4)),
substr(histogram,49,8) (byte(8)),
substr(histogram,105,3*8) (byte(24))
from dbc.statstbl
where statsid > 0;

Hi Dieter,  I am currently working at a Teradata customer that is running TD 13.10. Where can I find your Statsinfo scripts for that release.

Thanks,

Chet

N/A

Hi Chet,

my old Statsinfo query can be found here.

The link is in the 1st paragraph :-)

Thanks for the quick response Dieter.  Sorry I missed that 1st link reference.

Chet

Teradata Employee

Dieter,

Any update on your 15.10 version - if any is needed?

Thanks - JK

N/A

Hi JK,

it's still working in TD15.10.

There's a new StatsVersion 6, but when I checked those reserved/unused bytes still contain only zero. 

Hi Dieter,

Is there any way to fetch maxval for multicolumn stats,using DBC.StatsTbl.I want to find all occurences where stats value  is reported as 'Truncated' , so that I can increase the MAXLENGTHVALUE.

Hello all,

Just a couple of quetions on this. Does this work with td 15.10? And if so, can someone post the canned queries (e.g. show all tables with oudated stats? Show all tables with no stats defined?)

Thanks

N/A

This still works in TD15.10, there seems to be no additonal info.

Outdated stats can be found by adding a WHERE-condition on LastAlterTimeStamp/LastCollectTimeStamp and table without stats using NOT EXISTS. But you can easily do that using dbc.StatsV, no need to use my query. 

Greetings Dieter,

Not sure if this forum is the right place for my question, but perhaps you can steer me in the right direction.

I'm trying to automate STATS COLLECTION on truncate and reload tables using my custom Stored Procedures.

I'm collecting statistics on PRIMARY and Secondary indices.  I'm using format that you suggested in your earlier posts:

COLLECT STATISTICS ON <db>.<table_name> INDEX IndexName;

This approach works well for the Secondary Indexes.  However Primary index does not have Index Name in dbc.Indices view it is set to null and without Index Name I can not get syntax above to work.

I tried "COLLECT STATISTICS ON <db>.<table_name> index primary;"  but obviously it is not a correct syntax.

Yes I can rebuild columns that comprise a specific primary index, however I would prefer to do it in the single step similar to collecting stats on secondary index (calling index by its name).  If there is a way to do it, let me know.  Otherwise I'll just rebuild the column list that comprises the primary index.

Secondly, thank you for your contributions to Teradata knowledge base.  I learned a lot from your posts.  Really appreciate your efforts

Thank you,

Dmitry T

Just found a work around to my question.  It seems that a PI can be renamed or given a name without copying data and dropping original table.  As long as index composition is the same.

 alter table <Table_Name> modify

primary index Index_Name(Inde_Col1, ..Index_columnn);

Thanks again.

DT

N/A

Hi Dmitry,

you mean the first stats collection when there's no row in dbc.StatsV, yet?

I don't think there's a way to avoid using a column list, there's no guarantee that PIs or SIs are named.

Hi Dieter,

 

I'm new to Teradata and I have a task to figure out tables with no statistics. Can i use this new table/views to identify them? 

Can I combine dbc.statsv and dbc.tables to list tables w/o statistics? 

Is it safe to say that tables that do not listed on this views probably do not have statistics?

 

Thank you.

N/A

@mavistal8:

Yep, this will return all tables without stats.

select *
from dbc.TablesV as t
where TableKind = 'T'
  and not exists (
select * from dbc.StatsV as s   where s.databasename = t.databasename     and s.tablename = t.tablename  )