MVC and Presence Bits

Database
Enthusiast

MVC and Presence Bits

Hi,

I have a problem with understanding of Presence Bits, can someone help me?

 

Lets say I have got table:

 

CREATE SET TABLE DataBaseName.TableName,NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT,
DEFAULT MERGEBLOCKRATIO
(
  COL1 VARCHAR(60) CHARACTER SET LATIN CASESPECIFIC,
  COL2 DECIMAL(10,0),
  COL3 TIMESTAMP(0),
  COL4 VARCHAR(90) CHARACTER SET UNICODE CASESPECIFIC,
  COL5 VARCHAR(12) CHARACTER SET LATIN CASESPECIFIC,
  COL6 VARCHAR(762) CHARACTER SET UNICODE CASESPECIFIC,
  COL7 VARCHAR(762) CHARACTER SET UNICODE CASESPECIFIC,
  COL8 VARCHAR(762) CHARACTER SET UNICODE CASESPECIFIC,
  COL9 VARCHAR(45) CHARACTER SET LATIN CASESPECIFIC,
  COL10 TIMESTAMP(0),
  COL11 VARCHAR(90) CHARACTER SET UNICODE CASESPECIFIC,
  COL12 TIMESTAMP(0),
  COL13 VARCHAR(90) CHARACTER SET UNICODE CASESPECIFIC,
  COL14 DECIMAL(15,0),
  COL15 TIMESTAMP(0))
UNIQUE PRIMARY INDEX ( COL1,COL2);

 

RecordCount = 250 millions

TableSize = 91847862272 Bytes ~ 85 GB

 

As I understand in this scenario I should have 2 full octets:

  • first bit is occupied by the system
  • 15 bits (7 bits in frist octet and 8 bits in second octet) for nullable values for each collumn (15 columns)

 

So adding even one value for MVC for any column would add 1 additional byte to row header, because another octet will be added. Am I right?

 

Going further, I added 255 pointless (that do not exists in those fields) compression values to 2 columns (to be sure that 2 additional octets will be added) and yet, nothing changed, why?

 

I was sure that table size would be increased -> Table header (because of the small size, let's skip it) and row header = Cardinality * 2 bytes (for each octet) -> 500000000 bytes ~0.5 GB

 

But nothing changed, why? Even less understandable to me is that after adding compression values, the table size minimally decreased. (Those values do not exists in columns or even table) Ye, I got it that null values ​​were compressed, but gradually increasing the compression values (1 value, 3, 7, 15, 31, 63, 127, 255) table size was decreasing f.e

1 value   - 91847501312 bytes

3 values - 91847265792 bytes

7 values - 91846960640 bytes

 

After compression table looks like that:


ALTER TABLE DataBaseName.TableName ADD COL4 COMPRESS('word1', 'word2'...,'word255');
ALTER TABLE DataBaseName.TableName ADD COL5 COMPRESS('difword1', 'difword2'...,'difword255');

 

CREATE SET TABLE DataBaseName.TableName,NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT,
DEFAULT MERGEBLOCKRATIO
(
  COL1 VARCHAR(60) CHARACTER SET LATIN CASESPECIFIC,
  COL2 DECIMAL(10,0),
  COL3 TIMESTAMP(0),
  COL4 VARCHAR(90) CHARACTER SET UNICODE CASESPECIFIC COMPRESS('word1', 'word2'...,'word255'),
  COL5 VARCHAR(12) CHARACTER SET LATIN CASESPECIFIC COMPRESS('difword1', 'difword2'...,'difword255'),
  COL6 VARCHAR(762) CHARACTER SET UNICODE CASESPECIFIC,
  COL7 VARCHAR(762) CHARACTER SET UNICODE CASESPECIFIC,
  COL8 VARCHAR(762) CHARACTER SET UNICODE CASESPECIFIC,
  COL9 VARCHAR(45) CHARACTER SET LATIN CASESPECIFIC,
  COL10 TIMESTAMP(0),
  COL11 VARCHAR(90) CHARACTER SET UNICODE CASESPECIFIC,
  COL12 TIMESTAMP(0),
  COL13 VARCHAR(90) CHARACTER SET UNICODE CASESPECIFIC,
  COL14 DECIMAL(15,0),
  COL15 TIMESTAMP(0))
UNIQUE PRIMARY INDEX ( COL1,COL2);

 

CurrentSize =  91847035392 Bytes

BeforeSize =   91847862272 Bytes


Accepted Solutions
Highlighted
Teradata Employee

Re: MVC and Presence Bits

The 2 "reserved" bytes and the 2 bytes per column in the "variable offsets" array for variable-length columns without compression are fixed. But when you compress a variable-length column, the length for any non-compressed value is stored as the first 1 (if max length <=255) or 2 bytes of the data within the "compressed columns" array - similar to what happens for ALC. This is described in the Database Design manual section on compressing variable-length columns.

 

And even if you know exact row sizes, you can't really calculate the exact final PermSpace except perhaps when you load into an empty table, since data blocks are variable length so there typically will be varying amounts of free space at the ends of blocks. We just focus on whether the average row size can be expected to decrease. 

 

1 ACCEPTED SOLUTION
9 REPLIES 9
Teradata Employee

Re: MVC and Presence Bits

Does this table have BLC as well?

Adding a consistently repeating bit pattern to the row header may slightly improve effectiveness of block-level compression.

Enthusiast

Re: MVC and Presence Bits

No, we can call it brand new table created for testing purpose. There isn't any ALC, BLC or MVC (except those 2 ).

Is my base understanding (in previous message) correct?

Teradata Employee

Re: MVC and Presence Bits

Many systems have BLC enabled as the default for all new tables.

 

But in any case, compressing more values takes more presence bits and makes the row headers longer. That part is correct.

Expecting the allocated PermSpace to closely follow a small change in row size is oversimplifying, though. It does not necessarily translate into more disk sectors. For example, there could have been some unused free space within blocks, so additional space allocation is not needed. And data blocks can be expanded, merged, or split and the result might be less total embedded free space (or it could sometimes result in more free space).

Enthusiast

Re: MVC and Presence Bits

@FredThanks for responses.

 

So is there any method to control space allocation and presence bits in particular?

 

I'm trying to create fully automatic procedure for MVC, but without full knowledge about presence bits it seems impossible.

 

Of course it's possible to create procedure/macro that calculate top x most space consuming values etc., but you can't estimate the "ideal" values.

 

Without full knowledge about it, you can't just simply do it like I did it:

  • calculate actual octets and the "ideal" value of additional octets
  • calculate "free" bits from actual - last octet and sum up with bits from additional octets
  • calculate the "best" bits allocation between columns
  • select top x space consuming values for column for allocated bits

Am I right?

 

Coming back to BLC I checked it using Ferret (scope table - Compression Status = 'U')  and command

show stats values on  /* BLCPctCompressed */ 0.00,

Teradata Employee

Re: MVC and Presence Bits

This is why the processes that do this best are executable programs that have the logic to run the math through multiple "scenarios".

 

Why are you not using one of these, like WinMVC that already exist?

 

thanks

 

dave

Enthusiast

Re: MVC and Presence Bits

@David_Roththanks for response.

 

Tbh I have never heard about WinMVC and cant find it aswell. It may be a bit a cheap, but can u link a download?

 

I have heard about PRISE, but never tried it out. Is it worth giving it a shot?

 

Greetings

Teradata Employee

Re: MVC and Presence Bits

It occurred to me that I was overlooking some additional factors at play here. In particular:

VARCHAR columns normally have a 2-byte length, but compressible variable-length columns with max length <= 255 will use only 1 length byte for the non-compressed values. That provides some immediate offset to the increased header length.

 

There can also be "pad" bytes for boundary alignment; a difference of one byte may not change the storage required.

 

You are correct that finding "ideal" compression could be quite difficult. It's generally sufficient to focus on things that will yield obvious improvement.

 

 

Enthusiast

Re: MVC and Presence Bits

@Fredthanks again for response.

 

Am I understanding it correctly, one of those 2 "reserved" bytes for variable row length may be used for other things?

I'm aware that length of the most values in column is lower than 255 and do not require 2 bytes to store length value, but I thought second byte would be "blocked" anyway - just in case. This knowledge helps a lot, thank you.

 

Boundary alignment is the reason why I added 2 addional octets - one by one.

 

If im not wrong, if table contains only fixed columns it's possible to accurately define if row header length is even or odd:

For non PPI tables it would be:

2 (bytes reference array pointer) + 2 (bytes for row length) + 8 (bytes for rowid) + Columns Size + Presence Bytes (1 bit occupied by the system, 1 bit per nullable column + compression bits ).

 

It become harder, when table contains variable column/columns. You would have to calculate row length for each record and then calculate percentage of even and odd rows to finally decide is it worth adding additional byte for boundary alignment.

 

I am really grateful for your professional help @Fred , thanks.

 

Highlighted
Teradata Employee

Re: MVC and Presence Bits

The 2 "reserved" bytes and the 2 bytes per column in the "variable offsets" array for variable-length columns without compression are fixed. But when you compress a variable-length column, the length for any non-compressed value is stored as the first 1 (if max length <=255) or 2 bytes of the data within the "compressed columns" array - similar to what happens for ALC. This is described in the Database Design manual section on compressing variable-length columns.

 

And even if you know exact row sizes, you can't really calculate the exact final PermSpace except perhaps when you load into an empty table, since data blocks are variable length so there typically will be varying amounts of free space at the ends of blocks. We just focus on whether the average row size can be expected to decrease.