Large-scale update question

Database
Enthusiast

Large-scale update question

I have a table with 700m rows, evenly distributed 1-column integer NUPI plus 3-column integer USI (1st col us the also PI).  Queries run fine, but when we push updates or deletes to this table in ETL jobs, we have problems.  We use a stage table with identical structure to the 700m target table, and use a joined Update statement to update corresponding records in the target table each night.  There are an everage of 1-3m rows for updating in the stage table. The single update statement runs 3+ hours despite Explains estimating 10-12 minutes We have looked at partitions but the wide variety of incoming 1-3m rows prevents partition elimination.

From a conceptual point of view, what would be a good approach for pushing millions of updates/deletes to a 700m row table like this?  

The incoming stage and target tables have the same structure:

CREATE MULTISET TABLE deck_prod_tbls.inventory_Event_usage ,NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT,
DEFAULT MERGEBLOCKRATIO
(
Event_Gnbr DECIMAL(15,0) NOT NULL,
Inv_Gnbr DECIMAL(15,0) NOT NULL,
Usage_Parm_Code VARCHAR(8) CHARACTER SET LATIN NOT CASESPECIFIC NOT NULL,
...(other cols)...
PRIMARY INDEX XPI_Inventory_Event_Usage ( Inv_Gnbr )
UNIQUE INDEX ( Event_Gnbr ,Inv_Gnbr ,Usage_Parm_Code )
INDEX ( Source_Data_Samp_Ord ,Source_Data_Samp_Part_Ord ,Source_Event_Db_Id ,
Source_Event_Id ,Source_Name );

Stage table with 1-3m rows in STAGEDB.INV_EVENT_USAGE_CH, the 700m row target table is TABLESDB.INVENTORY_EVENT_USAGE.  Similar issue with deletes.  

update t
from TABLESDB.INVENTORY_EVENT_USAGE t,
STAGEDB.INV_EVENT_USAGE_CH c
set
Incremental_Usage_Qty = c.Incremental_Usage_Qty,
Rec_Status_Code = c.Rec_Status_Code,
Source_Create_Dt = c.Source_Create_Dt,
Source_Data_Samp_Ord = c.source_data_samp_ord,
Source_Data_Samp_Part_Ord = c.source_data_samp_part_ord,
Source_Data_Type_DB_ID = c.source_data_type_db_id,
Source_Data_Type_ID = c.source_data_type_id,
Source_Event_DB_ID = c.source_event_db_id,
Source_Event_Id = c.source_event_id,
Source_Event_Inv_Id = c.source_event_inv_id,
Source_Inv_DB_ID = c.source_inv_db_id,
Source_Inv_Id = c.source_inv_id,
Source_Rev_DB_ID = c.source_rev_db_id,
Source_Rev_Dt = c.source_rev_dt,
Usage_Since_New_Qty = c.usage_since_new_qty,
Usage_Since_Overhaul_Qty = c.usage_since_overhaul_qty,
Source_Name = c.source_name,
Last_Alter_Ts = c.last_alter_ts
where t.event_gnbr = c.event_gnbr
and t.inv_gnbr = c.inv_gnbr
and t.usage_parm_code = c.usage_parm_code;

--
-- Delete
--
delete TABLESDB.INVENTORY_EVENT_USAGE
where TABLESDB.INVENTORY_EVENT_USAGE.event_gnbr = STAGEDB.INV_EVENT_USAGE_DEL.event_gnbr
and TABLESDB.INVENTORY_EVENT_USAGE.inv_gnbr = STAGEDB.INV_EVENT_USAGE_DEL.inv_gnbr
and TABLESDB.INVENTORY_EVENT_USAGE.usage_parm_code = STAGEDB.INV_EVENT_USAGE_DEL.usage_parm_code;

Each of the 2 statements above runs 3+ hours when about 2m rows are being updated... insanely slow.

Thanks in advance,

Rich

1 REPLY
Enthusiast

Re: Large-scale update question

No responses, but I found a solution.  It appears that using joined-updates or joined-deletes is not viable performance-wise unless both table have the same primary index AND that primary index exactly matches the WHERE clause.  I dropped execution time from 4 hours to 20 minutes by doing this:

  1. Create temp table containing all unique INV_GNBRs (the NUSI) I needed to process.
  2. Create a load table with all target records in the 700m row table matching the temp NUSI values.  This pulled about 10m rows even though I only need to update/delete about 600k rows.
  3. Apply the joined updates and deletes against the 10m load table.  About 600k rows touched.  Elapsed time was about 4 minutes.
  4. Use a joined-delete on the 700m row target table using the temp keys table.  They have the same PI and the WHERE clauses was *only* the PI.  It took about 10 minutes to deleted the 10m rows.
  5. Insert the !10m rows (slightly less due to deletes applied to it) rows from the load table into the 700m row target table.  Elapsed time was about 6 minutes.

In the end, I replaced 2 SQL statements (Update and Delete) with 4 pages of formatted code, but I got the performance I needed.