Performance comparsion MERGE INTO vs classic UPSERT

Teradata Employee

Performance comparsion MERGE INTO vs classic UPSERT


is there any study/information about execution speed of MERGE INTO compared to INSERT/UPDATEstatement? Which will generally perform better?

Lets assume following:
I have working (W) and target (T) table of same structure and of unspecified size with 0-3 Join Indexes on T. I need to update / insert rows from W into T using SQL only.

I have 3 options:
b) DELETE rows from T which are in W and INSERT all rows from W into T
c) INSERT rows from W which are not in T, UPDATE rows in T with rows from W

I took some tests on 2550 system and it seems , that generally a) is fastest option with no JI on T table and c) is doing better on table with 1-3 JIs.

Anyone did similar tests? Can someone post comments on this topic?

Teradata Employee

Re: Performance comparsion MERGE INTO vs classic UPSERT

Insert/Select to a Target table with join indexes is usually the fastest option,however the update operation may cause trouble in two cases:
if the updated column is part of the join indexes and there is a skew on the column in relation of the NUPI of the join index.
if updated column(s) in the base table is (are) defined as NUPI for the join indexes.
Teradata Employee

Re: Performance comparsion MERGE INTO vs classic UPSERT

Thanks for quick reply,

i believe that we can make an theoretical conclusion, that generally speaking - PERFORMANCE wise is fastest option (and best solution for above mentoined system):

for tables with no JI: MERGE INTO
for tables with JI: INSERT/UPDATE
for tables with one or more complex JI or for NUPIs of JI being updated in T table: drop JI , MERGE INTO table T, recreate JI

Sounds good?

Teradata Employee

Re: Performance comparsion MERGE INTO vs classic UPSERT

Yes generally speaking. However, the third solution may be very time consuming if you have too much or big indexes on your base table.
Teradata Employee

Re: Performance comparsion MERGE INTO vs classic UPSERT

Merge-into uses a new step called Merge Row Multiple Operations (MRM) that can do both updates and inserts in 1 step. It uses optimizations similar to those used by MultiLoad to replace an entire block at a time, which can reduce I/O. Merge-into can be more efficient for join updates compared to an insert/select that has conditional logic, because it only reads the target table once during the MRM step. An insert/select with conditional logic will have to access the target table two times.

There is additional information about the differences between merge-into and insert/select in the orange book titled: "Exploring the Benefits of Teradata 12.0 ANSI MERGE Enhancements."

When it comes to join indexes, here is how the two approaches line up:

If a join index exists on the base table and does NOT include the rowID of the base table:

-- Merge-Into will use two spool files for source rows, one for base table rows to be inserted or updated, and one additional spool for join index maintenance rows.

-- Insert/Select will only require one spool for both base table and join index inserts and updates.

If the join index DOES include the rowID of the base table:

-- Merge-Into will use the same two spool files mentioned above.

-- Insert/Select builds a new spool file during the merge step that will be used for the final join index maintenance.

So, theoretically, you could say insert/select could be faster with join indexes, particularly when it does not carry the row ID, because it only uses 1 spool file for both base and join index maintenance rows. However, if there is a a high percentage of inserts/updates, such that a single data block is updated many times, then merge-into could outperform insert/select with join indexes. So performance will depend on demographics. For inserts into an empty or near-empty table, it is likely one will work as well as the other.

If you are using merge-into with inserts only and no updates, then you will get slightly better performance if you omit the UPDATE clause of the merge-into syntax. This will prevent the merge-into logic from generating the JI update plan and trying to build a spool file for the join index update rows. The optimizer does not know that there are no updates at the time of optimization, so leave off the UPDATE clause if you don't need it in order to achieve some additional efficiency.

Thansk, - Carrie