I am trying to take a copy of a big table to test out some performance improvements on it. The table has 4.000.000.000 rows.
I did CREATE TABLE AS
CREATE TABLE DATABASE2.COPY_OF_TABLE AS DATABASE1.ORIGINAL_TABLE WITH DATA AND STATS;
I had to cancel the query after 2 hours at
because the DBC.Dbase lock was blocking other important queries.
Table is 280GB with a skew of 1%
CREATE MULTISET TABLE DATABASE1.ORIGINAL_TABLE ,NO FALLBACK , NO BEFORE JOURNAL, NO AFTER JOURNAL, FREESPACE = 10 PERCENT, CHECKSUM = DEFAULT, DEFAULT MERGEBLOCKRATIO ( AR_ID INTEGER TITLE 'Arrangement Id' NOT NULL, EFF_DT DATE FORMAT 'YYYY-MM-DD' TITLE 'Effective Date' NOT NULL COMPRESS (DATE '1900-01-01'), END_DT DATE FORMAT 'YYYY-MM-DD' TITLE 'End Date' NOT NULL COMPRESS (DATE '2999-12-31'), INT_TP_CL CHAR(5) CHARACTER SET LATIN NOT CASESPECIFIC TITLE 'Interest Type Id' NOT NULL, LDB_ID SMALLINT TITLE 'LDB Id' NOT NULL, SEQ_ID BYTEINT NOT NULL COMPRESS (1 ,2 ,3 ), INT_BAL_MX_AMT DECIMAL(15,2) TITLE 'Interest Balance Maximum Amount' NOT NULL COMPRESS 9999999999999.99 , INT_APL_TP_CL CHAR(5) CHARACTER SET LATIN NOT CASESPECIFIC TITLE 'Interest Calculation Basis Id' NOT NULL, EFF_RATE_PCT DECIMAL(9,5) TITLE 'Effective Rate Percentage' COMPRESS (2.00000 ,1.00000 ,3.00000 ), SPRD_RATE_PCT DECIMAL(9,5) TITLE 'Spread Rate Percentage' COMPRESS (2.00000 ,1.00000 ,3.00000 ), CR_DB_IND_CL CHAR(5) CHARACTER SET LATIN NOT CASESPECIFIC TITLE 'Credit Debit Indicator' NOT NULL, INT_RATE_CSZ_F CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC TITLE 'Interest Rate Customization Flag' NOT NULL COMPRESS (' ','N','Y'), PPN_EV_ID INTEGER TITLE 'Population Event ID' NOT NULL, OLD_END_DT DATE FORMAT 'YYYY-MM-DD' NOT NULL DEFAULT DATE '1900-01-01' COMPRESS (DATE '1900-01-01',DATE '2999-12-31'), UPD_PPN_EV_ID INTEGER NOT NULL DEFAULT 0 COMPRESS 0 ) PRIMARY INDEX ( AR_ID ) PARTITION BY LDB_ID UNIQUE INDEX USI1 ( AR_ID ,EFF_DT ,INT_TP_CL ,LDB_ID ,SEQ_ID );
INSERTING and UPDATING aginst the table is really slow aswell.
Best Regards, Klaus
To prevent the dbc.dbase lock causing problems you can:
create table t1_copy as t1 WITH NO DATA; insert into t1_copy select * from t1;
Run the above as separate requests. The lock on dbc.dbase in the 'create table as' request will only be held for a short time.
You might also want to try removing the USI on the new table until after all of the data is copied across. Create it using a separate request after.
Check that the target table is defined as MULTISET - yes it should be, but...
You say that your source table is 280GB, this isn't that big.
How many AMPs on the system? Even if it is only ~150 then this is less than 2GB per AMP.
If it is too slow then you can copy the data 'bit by bit'.
Run multiple INSERT/SELECT requests but ensure that you copy entire partitions at a time (i.e. values for column LBD_ID), so something like:
insert into t1_copy select * from t1 where lbd_id between 1 and 10;
insert into t1_copy select * from t1 where lbd_id between 11 and 20;
And so on until you have all of the data.
A couple of thoughts.
I think Dave's solution could be helpful.
We have tried this and it helped us to improve the performance.
Let us know whether it works for you.
I did a lot of testing but just couldn't get the performance I wanted (or had hoped for).
Did the new empty table without the extra USI and then the INSERT but the performance was still very slow.
Did a bit of Looping Inserting chunks of data but performance was still not there.
I ended up Restoring the Table from our backup with COPY. Doable but not ideal.
Thanks for the suggestions. Still not sure what to do with the big tables. Maybe our 96GB memory leads to some swapping issues.
Thanks for the update, I'm glad that you were able to get this done even if not in the intended/expected way.
I wouldn't have thought you'd have to take the approach that you did, maybe you could tell me:
- How many AMPs on your system?
- How many Nodes on your system?
- What Node type are you using (I'm guessing an older type if you only have 96GB memory per node).?
- What is your expectation/hope in terms of performance for this processing?
- When you say "performance was still slow" - what were you measuring and how slow was it?
It may be something can be done for future 'big tables'.
What is the distribution of values in LDB_ID? I'm wondering if this table has a huge number of very small partitions or a huge number of empty partitions? (Without specifying any partitioning attributes, the entire range of smallint is used at a granularity of 1 making 65536 partitions).
Just to make sure something non-obvious isn't happening, could we see the Explain for the INSERT SELECT statement?
What release of Teradata is running?
Created Target table SDC_SPACERES.KLAUS_AR_INT_RATE_ITRV as INPUT table with no data.
Dropped the USI on the Target table.
INSERT/SELECT Explain (14 hours estimate)
Explain INSERT INTO SDC_SPACERES.KLAUS_AR_INT_RATE_ITRV SELECT * FROM EDWSoR_P.AR_INT_RATE_ITRV; 1) First, we lock a distinct SDC_SPACERES."pseudo table" for write on a RowHash to prevent global deadlock for SDC_SPACERES.KLAUS_AR_INT_RATE_ITRV. 2) Next, we lock a distinct EDWSoR_P."pseudo table" for read on a RowHash to prevent global deadlock for EDWSoR_P.AR_INT_RATE_ITRV. 3) We lock SDC_SPACERES.KLAUS_AR_INT_RATE_ITRV for write, and we lock EDWSoR_P.AR_INT_RATE_ITRV for read. 4) We do an all-AMPs MERGE into SDC_SPACERES.KLAUS_AR_INT_RATE_ITRV from EDWSoR_P.AR_INT_RATE_ITRV. The size is estimated with no confidence to be 4,876,420,608 rows. The estimated time for this step is 14 hours and 2 minutes. 5) We spoil the parser's dictionary cache for the table. 6) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> No rows are returned to the user as the result of statement 1.