Copying trillions of data to new table

Database
Fan

Copying trillions of data to new table

Hello

I have recently started working on teradata. I am stuck at a point. I want to move trillions of rows from one table to other. what will be the best way to do this? creating the structure and inserting the data or something like select * into as in sql which is minimally logged. 

Can someone please suggest.

Tags (2)
13 REPLIES
Junior Contributor

Re: Copying trillions of data to new table

Really trillions? There are not many customers with that large amount of data in a single table.

The fastest way will utilize a FastPath Insert/Select:

Target table is empty and has exactly the same definition as the source table.

If the DDLs are different, but the PI is the same, you should prefer a Merge over InsSel (it avoids spooling).

If PIs are different you will need a spool with the size of the target table.

Fan

Re: Copying trillions of data to new table

Thanks... yes this is what my seniors are saying here... the size can go upto it.......

so i should get the DLL of the soure table and create a new table and then use insert select.... 

Thanks a lot

Junior Contributor

Re: Copying trillions of data to new table

Of course there's an obvious question: WHY do you need to copy trillions of rows?

Re: Copying trillions of data to new table

Hi Dieter, 

We have a mirror database concept where data from t1 table is copied into t2 & t3 tables.

All  t1, t2 & t3 tables have same DDL definitionand everytime it is a complete delete and re-load into t2 or t3 ( based on view swap)

one of the Main source tables (t1) have around 93 milllion records.

Problem is, this re-load takes around 6 minutes to complete. 

Is there any way we can improve the copy time ?

we are following this same approach for 350 tables, which are deleted and re-loaded in sequence. It takes close to 30 minutes to perform mirror copy.

Is there any better approach for this?

Re: Copying trillions of data to new table

adding to the above:

All the tables have 5+ secondary indices(NUSI).

Teradata Employee

Re: Copying trillions of data to new table

For huge volume, generally you want the target table to be empty and same structure as source, but with no SIs / JIs.

Drop SIs, Delete, Insert/Select, re-Create SIs.

The Delete should be done in Teradata mode with no BT/ET, or else the Delete and subsequent ET / Commit should be issued together as a multi-statement request.

You may also want to copy statistics values from the original table rather than re-collecting.

Junior Contributor

Re: Copying trillions of data to new table

The best way to decrease runtime would be removing some NUSIs.

Why do you have so many NUSIs, this did you check if they're actually used?

Re: Copying trillions of data to new table

Yeah. Most of them being used in different reports.

Some are multicolumn NUSI, and most are single column NUSI.

Our purpose is to do mirror copy from OLTP to MART.

Instead of loading entire records, I'm thinking to load only the changed records to MART to reduce copying time.

But even MINUS is of no use, as it takes much time and spool. 

Will temporal be of any use ?

Re: Copying trillions of data to new table

Hello,

Help me with a query to purge data from the below table. This table has 670,235,825,680 rows. I'm trying to delete around 164,154,526,390 rows using cal_dt column.

del from pr_us_inv.str_inv_dly where cal_dt < '2010-01-01';

del from pr_us_inv.str_inv_dly where cal_dt between '2012-01-01' and '2013-12-31';

Both the above queries are consuming much time due to huge table size (33 Tb).

Below is the table structure,

show table pr_us_inv.str_inv_dly

CREATE MULTISET TABLE pr_us_inv.str_inv_dly ,NO FALLBACK ,

     NO BEFORE JOURNAL,

     NO AFTER JOURNAL,

     CHECKSUM = DEFAULT,

     DEFAULT MERGEBLOCKRATIO

     (

      STR_LOC_ID INTEGER NOT NULL,

      ITEM_ID INTEGER NOT NULL,

      CAL_DT DATE FORMAT 'YYYY-MM-DD' NOT NULL,

      LAST_UPD_TS TIMESTAMP(6) NOT NULL,

      CURNC_TYP_CD SMALLINT NOT NULL COMPRESS 7 ,

      CHG_OH_QTY DECIMAL(17,4) COMPRESS 0.0000 ,

      ORD_OH_USAGE_FLG CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC COMPRESS ('N','Y'),

      FRST_RCVD_DT DATE FORMAT 'YYYY-MM-DD' COMPRESS (DATE '1997-05-04',DATE '0002-11-30'),

      ITEM_VLCTY_CD SMALLINT COMPRESS (1 ,2 ,3 ,4 ,5 ,-1 ),

      ITEM_STAT_CD SMALLINT NOT NULL COMPRESS (300 ,600 ,100 ,400 ,200 ,500 ,-1 ),

      RAW_OH_QTY DECIMAL(9,2) COMPRESS (0.00 ,4.00 ),

      CORD_ALLOC_QTY DECIMAL(9,2) COMPRESS 0.00 ,

      CORD_RSVD_QTY DECIMAL(9,2) COMPRESS 0.00 )

PRIMARY INDEX ( STR_LOC_ID ,ITEM_ID ,CAL_DT )

PARTITION BY RANGE_N(CAL_DT  BETWEEN DATE '2007-01-01' AND DATE '2020-12-31' EACH INTERVAL '1' DAY ,

 NO RANGE, UNKNOWN);

Thanks,

Raju