Explain plan for set table

Database
Enthusiast

Explain plan for set table

Hi,

I have a requirement to convert around 50 tables into multiset from set. I am trying to get the tradeofff of keeping the table as set against perfomance of inserting recs into it. I did an explain plan on insert statmenets for a table defined as set and then altered it as set. Both insert gives me same explain plan. I was expecting some additional steps in plan for set table because of system level duplicate check.

So my question does explain plan shows the additional steps involved in syetm level duplicate check performed for a set table.

Abin

8 REPLIES
Teradata Employee

Re: Explain plan for set table

Hello Abin,

you have really provided the answer in the first paragraph, the are no additional steps.   The phrase "defined as set and then altered it as set" has the "set" mentioned twice, but I guess you mean one "set" and one "multiset".

The reason is that the duplicate check is performed not before and not after the insert, but during the insert, and only for the rows being inserted.   The check is performed on the AMP level, not on the system level.    Why?   Remember that rows in regular Teradata tables are stored in PI rowhash sequence (PPI and NoPI are a bit different).   When new rows are added, they are not added at the end of the table, but at the proper blocks to ensure the rowhash sequence.    Therefore, it is convenient to do the duplicate check during the insert:   for a given row, the AMP locates the data block where the row should be inserted, and if there is a duplicate row that already exists, then that row is located in that data block (duplicate row - same PI value - same rowhash value).   So, it is convenient to check the duplicates right during the insert operation.

Regards,

Vlad.

Enthusiast

Re: Explain plan for set table

I agree with Vlad, the number of steps will be same with set and multiset tables, but the duplicate row check will be made at block level, byte by byte.

Khurram
Enthusiast

Re: Explain plan for set table

Hi,

What is your data volume?
I feel that as  amount of data grows huge then you will see the difference of set and multiset, a duplicate check for set.

Cheers,
Raja
Teradata Employee

Re: Explain plan for set table

Hi Khurram,

Thank you for agreeing with me.   However, I believe that there is no need to dedicate an additional post to say only that. :)  

Regards,

Vlad.

Enthusiast

Re: Explain plan for set table

Hi Vlad,

you are welcome, but this time I do not agree with you :)

Khurram
Enthusiast

Re: Explain plan for set table

Thanks all for the responses.

Vlad,

Yes the statment has a spello, it should read as " I did an explain plan on insert statmenets for a table defined as set and then altered it as multiset."

Raja,

The volume of data differes, I have around 60 tables to consider.

So all in all , I am going to look at the data volume I am expecting on each table. If table volume is not expected to be less even after long period, I am planning to keep them as set.

For larger tables (in count), I will have to check the trade off between keeping table as set and doing a group by on all fields after changing table to multiset. This is where I am stuck, because adddition of group by shows in plan with additional time and I am not able to check if that extra time is bad or good than the AMP level duplicate check for set table.

Abin.

Teradata Employee

Re: Explain plan for set table

Hi Abin,

it is possible to measure.   Create two similar tables - one set and one multiset, with the same data.   Insert same new data into both tables.   Ant then measure CPU consumption in the DBQL journal, in the AMPCpuTime field.   Even though the Explains are the same, the CPU consumption would probably be different due to duplicate checks.

Because of the AMP level mechanics mensioned above, the duplicate check does not directly depend on how big the table is.   Rather than that, it depends on the demographics of the Primary Index.   If it is almost unique, then the duplicate checks are reasonable.  Because you have to compare the rows only with the same PI value.   However, if there are PI values with high numbers of rows per value, then the insertion of a new row enforces the check with many rows with the same PI.   That's CPU intensive.

Should you have DBQL results that would prove otherwise, please do not hesitate to share.

Thank you,

Regards,

Vlad.

Enthusiast

Re: Explain plan for set table

Thanks Vlad for suggestion, I will check this and get back in case of any more questions.

Abin