We are using TD 13.10, and have a archive table containing a big amout of records (3.900.000.000+) (1.1TB of space).
The table definitions looks something like this:
CREATE MULTISET TABLE P01_EDW_ARCHIVE.ARC_TRANSACTION_LINE , NO FALLBACK, NO BEFORE JOURNAL, NO AFTER JOURNAL, CHECKSUM = DEFAULT, DEFAULT MERGEBLOCKRATIO
SGCLIENTID VARCHAR(7) CHARACTER SET LATIN NOT CASESPECIFIC,
STORELOCATIONID VARCHAR(6) CHARACTER SET LATIN NOT CASESPECIFIC,
DELETIONLINEIND VARCHAR(5) CHARACTER SET LATIN NOT CASESPECIFIC COMPRESS ('false','true'),
PRIMARY INDEX ( SGCLIENTID ,POSREGISTERID ,TRANSACTIONID ,LINENUM )
PARTITION BY (SYS_BATCH_NUM MOD 60000 )+ 1
INDEX IX_TL_DTTM ( DTTM )
INDEX IX_TRSLN_DELIND ( DELETIONLINEIND )
INDEX IX_TL_SYS_ARC_DTTM ( SYS_ARC_DTTM );
We have desided to clean out some of the old partitions, but deleting even single partitions takes a very long time. Partitions containing 20mio records takes more than one hour to delete.
We have tried different approaches; Simple delete statements like:
DELETE FROM table WHERE SYS_BATCH_NUM = XXX, Multiload scripts and are currently creating a temp table containing Partition column and Primary Index colums, and the inserting one partition at a time, deleting from table. But even this takes 25+minutes.
Statistics have been collected on the table.
We are looking at deleting about 2.500.000.000+ records, and with current speed this will take more than 2 days of runtime..
Do anyone have a suggestion on how to approach this deletion task.
PARTITION BY (SYS_BATCH_NUM MOD 60000 )+ 1 will never result in a fast path delete, so it's always using Transient Journal.
Plus your table has multiple secondary indexes which are another reason for slow deletes.
Are those indexes ever used, e.g. INDEX IX_TRSLN_DELIND ( DELETIONLINEIND ) on a column where you compress ('false','true').
Simply create an empty copy of the table and then Insert/Select the data you want to keep. Finally recreate the SIs (if they are actually used) and Drop/Rename.
And your PI doesn't seem to be usefull, do you ever use all columns including the LINENUM in a WHERE/JOIN?
Thanks Dieter for the quick response.
We have just recently been talking about if it would actually be better/faster to Create/Insert/Drop/Rename, instead of Delete, but have been a little hesitant due to the amount of records and the space that we would need. As you can probably imagine, there is a reason why we need to clean up som old data ;-)
The PI is actually used frequently by our ETL tool. And the SI's are used by "Reduction"-views that is also used by our ETL tool.
Once again thanks for the responce, not the one I was hoping for...
Some thoughts on how to approach,
1st since the table is an archiving one
, have you set MVC on all possible columns ?
,have you make a poc on adding block level compression (on a part in order to make some tests)?
, as Dieter mentioned - dml transactions on huge amount of data with indexes is not the most accurate on performance view choise - better drop --> delete -- > recreate
, is this table participate on any Join Index ?
,have you tried to alter and drop partitions && adding the new ones? , please provide as an explain and metrics from the delete statement.(which is the syntax of partition? does it has any no range...)