Find out Disk Space consumed by each column in a Teradata Table

Database
Highlighted
Teradata Employee

Find out Disk Space consumed by each column in a Teradata Table

Hello everyone,

 

I am trying to find the total disk space each column consumes for a particular table. I know i will not be able to get an exact figure of how much space each column will take because a varchar(500) might contain values as 'a1' or 'raza' and a decimal(15,2) might cointan the value '0.00'. So this is what i did.

 

Check the table space using the query (i know a dynamic query is not needed for this):

SELECT
Trim(DATABASENAME),
Trim(TableName),
'
SELECT
'''||TRIM(TABLENAME)||''' (TITLE '''||'TABLENAME'||'''),
SUM(CURRENTPERM)/1024/1024/1024 (TITLE '''||'CURRENT PERM (GB)'||''')
FROM
DBC.ALLSPACE
WHERE
DATABASENAME = '''||TRIM(DATABASENAME)||'''
AND
TABLENAME = '''||TRIM(TABLENAME)||'''
GROUP BY 1
ORDER BY 2 DESC;
'
FROM
DBC.TABLESV
WHERE
TRIM(DATABASENAME) = 'STG_OFSLL_T_350'
AND
TRIM(TABLENAME) LIKE 'L_ACCOUNTS%'
AND
TRIM(TABLEKIND) = 'T'
ORDER BY 1;

 

Check the row count of the table (i know a dynamic query is not needed for this):

SELECT
Trim(DATABASENAME),
Trim(TableName),
'SELECT CAST(COUNT(*) AS BIGINT) FROM ' || Trim(DATABASENAME) || '.' || Trim(TableName) || ';'
FROM
DBC.TABLESV
WHERE
Trim(DATABASENAME) = 'STG_OFSLL_T_350'
AND
Trim(TableName) = 'L_ACCOUNTS'
AND
Trim(TABLEKIND) = 'T';

 

Check the data type of each column and see how much space it is takes and multiple by the row count:

 

SELECT
Trim(COL.DATABASENAME),
Trim(COL.TableName),
Trim(COL.COLUMNNAME),

CASE
WHEN Trim(COL.COLUMNTYPE) = 'A1' THEN 'ARRAY'
WHEN Trim(COL.COLUMNTYPE) = 'AN' THEN 'MULTI-DIMENSIONAL ARRAY'
WHEN Trim(COL.COLUMNTYPE) = 'AT' THEN 'TIME'
WHEN Trim(COL.COLUMNTYPE) = 'BF' THEN 'BYTE'
WHEN Trim(COL.COLUMNTYPE) = 'BO' THEN 'BLOB'
WHEN Trim(COL.COLUMNTYPE) = 'BV' THEN 'VARBYTE'
WHEN Trim(COL.COLUMNTYPE) = 'CF' THEN 'CHARACTER'
WHEN Trim(COL.COLUMNTYPE) = 'CO' THEN 'CLOB'
WHEN Trim(COL.COLUMNTYPE) = 'CV' THEN 'VARCHAR'
WHEN Trim(COL.COLUMNTYPE) = 'D' THEN 'DECIMAL'
WHEN Trim(COL.COLUMNTYPE) = 'DA' THEN 'DATE'
WHEN Trim(COL.COLUMNTYPE) = 'DH' THEN 'INTERVAL DAY TO HOUR'
WHEN Trim(COL.COLUMNTYPE) = 'DM' THEN 'INTERVAL DAY TO MINUTE'
WHEN Trim(COL.COLUMNTYPE) = 'DS' THEN 'INTERVAL DAY TO SECOND'
WHEN Trim(COL.COLUMNTYPE) = 'DY' THEN 'INTERVAL DAY'
WHEN Trim(COL.COLUMNTYPE) = 'F' THEN 'FLOAT'
WHEN Trim(COL.COLUMNTYPE) = 'HM' THEN 'INTERVAL HOUR TO MINUTE'
WHEN Trim(COL.COLUMNTYPE) = 'HS' THEN 'INTERVAL HOUR TO SECOND'
WHEN Trim(COL.COLUMNTYPE) = 'HR' THEN 'INTERVAL HOUR'
WHEN Trim(COL.COLUMNTYPE) = 'I' THEN 'INTEGER'
WHEN Trim(COL.COLUMNTYPE) = 'I1' THEN 'BYTEINT'
WHEN Trim(COL.COLUMNTYPE) = 'I2' THEN 'SMALLINT'
WHEN Trim(COL.COLUMNTYPE) = 'I8' THEN 'BIGINT'
WHEN Trim(COL.COLUMNTYPE) = 'JN' THEN 'JSON'
WHEN Trim(COL.COLUMNTYPE) = 'MI' THEN 'INTERVAL MINUTE'
WHEN Trim(COL.COLUMNTYPE) = 'MO' THEN 'INTERVAL MONTH'
WHEN Trim(COL.COLUMNTYPE) = 'MS' THEN 'INTERVAL MINUTE TO SECOND'
WHEN Trim(COL.COLUMNTYPE) = 'N' THEN 'NUMBER'
WHEN Trim(COL.COLUMNTYPE) = 'PD' THEN 'PERIOD(DATE)'
WHEN Trim(COL.COLUMNTYPE) = 'PM' THEN 'PERIOD(TIMESTAMP WITH TIME ZONE)'
WHEN Trim(COL.COLUMNTYPE) = 'PS' THEN 'PERIOD(TIMESTAMP)'
WHEN Trim(COL.COLUMNTYPE) = 'PT' THEN 'PERIOD(TIME)'
WHEN Trim(COL.COLUMNTYPE) = 'PZ' THEN 'PERIOD(TIME WITH TIME ZONE)'
WHEN Trim(COL.COLUMNTYPE) = 'SC' THEN 'INTERVAL SECOND'
WHEN Trim(COL.COLUMNTYPE) = 'SZ' THEN 'TIMESTAMP WITH TIME ZONE'
WHEN Trim(COL.COLUMNTYPE) = 'TS' THEN 'TIMESTAMP'
WHEN Trim(COL.COLUMNTYPE) = 'TZ' THEN 'TIME WITH TIME ZONE'
WHEN Trim(COL.COLUMNTYPE) = 'UT' THEN 'UDT TYPE'
WHEN Trim(COL.COLUMNTYPE) = 'XM' THEN 'XML'
WHEN Trim(COL.COLUMNTYPE) = 'YM' THEN 'INTERVAL YEAR TO MONTH'
WHEN Trim(COL.COLUMNTYPE) = 'YR' THEN 'INTERVAL YEAR'
ELSE
''
END AS COLUMNTYPE,

CASE
WHEN Trim(COL.COLUMNTYPE) = 'D' THEN Trim(COL.DecimalTotalDigits) || ',' || Trim(COL.DecimalFractionalDigits)
ELSE
Trim(COL.COLUMNLENGTH)
END AS COLUMN_LENGTH,

Cast(
CASE
WHEN Trim(COL.COLUMNTYPE) = 'I1' THEN 1
WHEN Trim(COL.COLUMNTYPE) = 'BF' THEN Trim(COLUMNLENGTH)
WHEN Trim(COL.COLUMNTYPE) = 'I2' THEN 2
WHEN Trim(COL.COLUMNTYPE) = 'I' THEN 4
WHEN Trim(COL.COLUMNTYPE) = 'I8' THEN 8
WHEN Trim(COL.COLUMNTYPE) = 'F' THEN 8
WHEN Trim(COL.COLUMNTYPE) = 'D' AND DecimalTotalDigits BETWEEN 1 AND 2 THEN 1
WHEN Trim(COL.COLUMNTYPE) = 'D' AND DecimalTotalDigits BETWEEN 3 AND 4 THEN 2
WHEN Trim(COL.COLUMNTYPE) = 'D' AND DecimalTotalDigits BETWEEN 5 AND 9 THEN 4
WHEN Trim(COL.COLUMNTYPE) = 'D' AND DecimalTotalDigits BETWEEN 10 AND 18 THEN 8
WHEN Trim(COL.COLUMNTYPE) = 'D' AND DecimalTotalDigits BETWEEN 19 AND 38 THEN 8
WHEN Trim(COL.COLUMNTYPE) = 'CV' THEN Trim(COLUMNLENGTH)
WHEN Trim(COL.COLUMNTYPE) = 'CF' THEN Trim(COLUMNLENGTH)
WHEN Trim(COL.COLUMNTYPE) = 'DA' THEN 4
WHEN Trim(COL.COLUMNTYPE) = 'AT' THEN 6
WHEN Trim(COL.COLUMNTYPE) = 'PT' THEN 6
WHEN Trim(COL.COLUMNTYPE) = 'PZ' THEN 8
WHEN Trim(COL.COLUMNTYPE) = 'TZ' THEN 8
WHEN Trim(COL.COLUMNTYPE) = 'PM' THEN 12
WHEN Trim(COL.COLUMNTYPE) = 'PS' THEN 10
WHEN Trim(COL.COLUMNTYPE) = 'SZ' THEN 12
WHEN Trim(COL.COLUMNTYPE) = 'TS' THEN 10
WHEN Trim(COL.COLUMNTYPE) = 'N' THEN 18
WHEN Trim(COL.COLUMNTYPE) = 'BV' THEN Trim(COLUMNLENGTH)
ELSE
''
END AS INT) AS COLUMN_MAX_SIZE_BYTE,

CAST((COLUMN_MAX_SIZE_BYTE * <Insert_table_rowcount_here>)/ (1024 * 1024 * 1024) AS DECIMAL(28,12)) AS SIZE_IN_GB

FROM
DBC.COLUMNSV AS COL
LEFT OUTER JOIN
(
SELECT
*
FROM
DBC.TABLESV
) AS TAB
ON
Trim(COL.DATABASENAME) = Trim(TAB.DATABASENAME)
AND
Trim(COL.TableName) = Trim(TAB.TableName)

WHERE
Trim(COL.DATABASENAME) = 'STG_OFSLL_T_350'
AND
Trim(COL.TableName) = 'L_ACCOUNTS'
AND
Trim(TAB.TABLEKIND) = 'T'
ORDER BY
COL.DATABASENAME,
COL.TableName,
COL.COLUMNID;

 

This will give me the max size each column can take given the current row count of the table.

 

I know i can do a max(chars(coulumn_name)) for a varchar field to get a more accurate space estimate.

 

Is there a better way to get a more accurate answer of how much size the column would take?

2 REPLIES
Senior Apprentice

Re: Find out Disk Space consumed by each column in a Teradata Table

Hi,

 

That is probably as close as you'll get it. A couple of things to think about:

 

1) "I know i can do a max(chars(coulumn_name)) for a varchar field to get a more accurate space estimate." Why not use "AVG(chars(column-name))"? I realise that your code is documented as 'max size of column' but if you have long varchars then MAX could really skew the results.

 

2) In your select that calculates the length based on data type you could remove a lot of it and just use COLUMNLENGTH. You'll still need to handle things like TS columns because the value provided is not the stored length. Up to you whether you change it.

 

3) Recommend that you use the 'V' views (i.e. dbc.allspacev) instead of the compatibility views. The rest of your code won't change.

 

4) Your code doesn't take into account any USI's or NUSI's that are defined on the table. Their space will be included in the dbc.allspacev figures. I usually approximate SI sizes as:

- NUSI: 8 bytes per row in the table (unless the table is PPI then make it either 10 or 16)

- USI: (8 + size of data value) bytes per row in the table (unless the table is PPI then make it either 10+ or 16+)

 

There is also a 'collect demographics' command (I think that's it, I'll need to get back to you) which will give you this info more accurately.

 

5) You'll need to divide the space values by 2 for Fallback tables.

 

Note that your code doesn't take into account whether the table is compressed or not. IMHO this can be really difficult to do.

 

HTH
Dave

 

Ward Analytics Ltd - information in motion
www: http://www.ward-analytics.com
Teradata Employee

Re: Find out Disk Space consumed by each column in a Teradata Table

If you have teradata ps at your account, have them look into using this asset, which basically does this type of analysis for you and recommends data type changes.

 

Teradata Performance - Data Type Analysis Process v1.7

https://www.my.teradata.com/redir.html?assetID=DA006788

 

thanks

 

Dave