Copying giant table is very slow

Database
Enthusiast

Copying giant table is very slow

Hi

I am trying to take a copy of a big table to test out some performance improvements on it. The table has 4.000.000.000 rows.

I did CREATE TABLE AS

CREATE TABLE DATABASE2.COPY_OF_TABLE AS DATABASE1.ORIGINAL_TABLE WITH DATA AND STATS;

 I had to cancel the query after 2 hours at

Request I/O:33.433.324

because the DBC.Dbase lock was blocking other important queries.

Table is 280GB with a skew of 1%

 

CREATE MULTISET TABLE DATABASE1.ORIGINAL_TABLE ,NO FALLBACK ,
     NO BEFORE JOURNAL,
     NO AFTER JOURNAL,
     FREESPACE = 10 PERCENT,
     CHECKSUM = DEFAULT,
     DEFAULT MERGEBLOCKRATIO
     (
      AR_ID INTEGER TITLE 'Arrangement Id' NOT NULL,
      EFF_DT DATE FORMAT 'YYYY-MM-DD' TITLE 'Effective Date' NOT NULL COMPRESS (DATE '1900-01-01'),
      END_DT DATE FORMAT 'YYYY-MM-DD' TITLE 'End Date' NOT NULL COMPRESS (DATE '2999-12-31'),
      INT_TP_CL CHAR(5) CHARACTER SET LATIN NOT CASESPECIFIC TITLE 'Interest Type Id' NOT NULL,
      LDB_ID SMALLINT TITLE 'LDB Id' NOT NULL,
      SEQ_ID BYTEINT NOT NULL COMPRESS (1 ,2 ,3 ),
      INT_BAL_MX_AMT DECIMAL(15,2) TITLE 'Interest Balance Maximum Amount' NOT NULL COMPRESS 9999999999999.99 ,
      INT_APL_TP_CL CHAR(5) CHARACTER SET LATIN NOT CASESPECIFIC TITLE 'Interest Calculation Basis Id' NOT NULL,
      EFF_RATE_PCT DECIMAL(9,5) TITLE 'Effective Rate Percentage' COMPRESS (2.00000 ,1.00000 ,3.00000 ),
      SPRD_RATE_PCT DECIMAL(9,5) TITLE 'Spread Rate Percentage' COMPRESS (2.00000 ,1.00000 ,3.00000 ),
      CR_DB_IND_CL CHAR(5) CHARACTER SET LATIN NOT CASESPECIFIC TITLE 'Credit Debit Indicator' NOT NULL,
      INT_RATE_CSZ_F CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC TITLE 'Interest Rate Customization Flag' NOT NULL COMPRESS (' ','N','Y'),
      PPN_EV_ID INTEGER TITLE 'Population Event ID' NOT NULL,
      OLD_END_DT DATE FORMAT 'YYYY-MM-DD' NOT NULL DEFAULT DATE '1900-01-01' COMPRESS (DATE '1900-01-01',DATE '2999-12-31'),
      UPD_PPN_EV_ID INTEGER NOT NULL DEFAULT 0  COMPRESS 0 )
PRIMARY INDEX ( AR_ID )
PARTITION BY LDB_ID 
UNIQUE INDEX USI1 ( AR_ID ,EFF_DT ,INT_TP_CL ,LDB_ID ,SEQ_ID );

INSERTING and UPDATING aginst the table is really slow aswell.

Any Ideas?

 

Best Regards, Klaus

21 REPLIES
Senior Supporter

Re: Copying giant table is very slow

did you check the skew of the table?

Senior Supporter

Re: Copying giant table is very slow

sorry, just saw

Senior Apprentice

Re: Copying giant table is very slow

Hi,

 

To prevent the dbc.dbase lock causing problems you can:

create table t1_copy as t1 WITH NO DATA;
insert into t1_copy select * from t1;

Run the above as separate requests. The lock on dbc.dbase in the 'create table as' request will only be held for a short time.

 

You might also want to try removing the USI on the new table until after all of the data is copied across. Create it using a separate request after.

Check that the target table is defined as MULTISET - yes it should be, but...

 

You say that your source table is 280GB, this isn't that big.

How many AMPs on the system? Even if it is only ~150 then this is less than 2GB per AMP.

 

If it is too slow then you can copy the data 'bit by bit'.

Run multiple INSERT/SELECT requests but ensure that you copy entire partitions at a time (i.e. values for column LBD_ID), so something like:

insert into t1_copy
select *
from t1
where lbd_id between 1 and 10;

Next request:

insert into t1_copy
select *
from t1
where lbd_id between 11 and 20;

And so on until you have all of the data.

 

A couple of thoughts.

 

Cheers,

Dave

 

Ward Analytics Ltd - information in motion
www: http://www.ward-analytics.com
Supporter

Re: Copying giant table is very slow

Hi Klaus_Jeppesen,

 

I think Dave's solution could be helpful.

We have tried this and it helped us to improve the performance.

Let us know whether it works for you.

 

Thanks,

Rohan Sawant

 

Enthusiast

Re: Copying giant table is very slow

I did a lot of testing but just couldn't get the performance I wanted (or had hoped for).

 

Did the new empty table without the extra USI and then the INSERT but the performance was still very slow.

 

Did a bit of Looping Inserting chunks of data but performance was still not there.

 

I ended up Restoring the Table from our backup with COPY. Doable but not ideal.

 

Thanks for the suggestions. Still not sure what to do with the big tables. Maybe our 96GB memory leads to some swapping issues.

 

Thanks again

/Klaus

Highlighted
Senior Apprentice

Re: Copying giant table is very slow

Hi Klaus,

 

Thanks for the update, I'm glad that you were able to get this done even if not in the intended/expected way.

 

I wouldn't have thought you'd have to take the approach that you did, maybe you could tell me:

- How many AMPs on your system?

- How many Nodes on your system?

- What Node type are you using (I'm guessing an older type if you only have 96GB memory per node).?

- What is your expectation/hope in terms of performance for this processing?

- When you say "performance was still slow" - what were you measuring and how slow was it?

 

It may be something can be done for future 'big tables'.

 

Cheers,

Dave

Ward Analytics Ltd - information in motion
www: http://www.ward-analytics.com
Teradata Employee

Re: Copying giant table is very slow

What is the distribution of values in LDB_ID? I'm wondering if this table has a huge number of very small partitions or a huge number of empty partitions? (Without specifying any partitioning attributes, the entire range of smallint is used at a granularity of 1 making 65536 partitions).

 

Just to make sure something non-obvious isn't happening, could we see the Explain for the INSERT SELECT statement?

 

What release of Teradata is running?

Enthusiast

Re: Copying giant table is very slow

Configuration info:

2690 Appliance
6 NODES
143 AMPS
96GB MEM/NODE
TERADATA 15.00

 

Will look into Partitions and Explain.

Enthusiast

Re: Copying giant table is very slow

Created Target table SDC_SPACERES.KLAUS_AR_INT_RATE_ITRV as INPUT table with no data.

Dropped the USI on the Target table.

INSERT/SELECT Explain (14 hours estimate)

 

Explain INSERT INTO SDC_SPACERES.KLAUS_AR_INT_RATE_ITRV
SELECT * FROM EDWSoR_P.AR_INT_RATE_ITRV;

  1) First, we lock a distinct SDC_SPACERES."pseudo table" for write on
     a RowHash to prevent global deadlock for
     SDC_SPACERES.KLAUS_AR_INT_RATE_ITRV.
  2) Next, we lock a distinct EDWSoR_P."pseudo table" for read on a
     RowHash to prevent global deadlock for EDWSoR_P.AR_INT_RATE_ITRV.
  3) We lock SDC_SPACERES.KLAUS_AR_INT_RATE_ITRV for write, and we lock
     EDWSoR_P.AR_INT_RATE_ITRV for read.
  4) We do an all-AMPs MERGE into SDC_SPACERES.KLAUS_AR_INT_RATE_ITRV
     from EDWSoR_P.AR_INT_RATE_ITRV.  The size is estimated with no
     confidence to be 4,876,420,608 rows.  The estimated time for this
     step is 14 hours and 2 minutes.
  5) We spoil the parser's dictionary cache for the table.
  6) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> No rows are returned to the user as the result of statement 1.