Vertical pivoting

Database
Enthusiast

Vertical pivoting

Hello Gurus,

Good morning!!

I am facing an issue where I want to vertical pivot a column value with a delimiter in it. I am on DB version 13.0 and hence would not be able to use some of the functions we have in 14.10. Although we can use recursive functions, the volume is huge ( probably couple hundred million) and I run out of spool. I know there is a way to do it using Outer jouins and was wondering if someone can help me out.

Here is a sample..

create volatile table TEST_VT

(

TEST_ID varchar(100),

CAT_ID varchar(200)

)

UNIQUE PRIMARY INDEX(TEST_ID)

on commit preserve rows;

INSERT INTO TEST_VT VALUES('ABCD','54|223055|223056|223057|223058|223059|223060|223062|223063|223064|223065|223066|223067|223068|223075|223077|223078|3|12246|22682|1|32790|32809|33196|142416|158621|158624|135943|135945');

INSERT INTO TEST_VT VALUES ('MNOP','137744|137745|137749|137750|137755|137756|137758|137761|137762|137763|137767|137769|137774|137775|137776|137777|137779|137780|137782|137784|137785|137786|137787|137788|137789|137791');

This is how I want it..

TEST_ID     CAT_ID

==================

ABCD  54

ABCD  223055

ABCD  223056

ABCD  223057

ABCD  223058

ABCD  223059

.......

.

.....

.

.

.

.

.

.

MNOP  137744

MNOP  137745

MNOP  137749

MNOP  137750

MNOP  137755

MNOP  137755

.

.

Your help is appreciated.

Thanks,

Kim

7 REPLIES
Junior Contributor

Re: Vertical pivoting

Hi Kim,

for a "couple hundred million" rows recursion is not really recommended :-)

I posted a recursive query for that some time ago, the last query should avoid the spool problem (and for those large numbers the join should be on the PI, so in worst case you might have to create a Volatile Table):

http://forums.teradata.com/forum/database/stored-procedures-need-help-with-massaging-data-within-an-...

Enthusiast

Re: Vertical pivoting

Hi Dieter,

Thank you for your prompt response. Although my DBA has increased my spool big time, I run out of spool implementing your low spool solution. Is there any other to implement this? Please advise.

Thank you sir.

-Kim

Junior Contributor

Re: Vertical pivoting

Hi Kim,

assuming this is a one time load you might split the data in multiple parts ('a' to 'e', 'f' to 'j', etc) and repeat the query for each part.

What's the size of the table and your spool?

Enthusiast

Re: Vertical pivoting

Hi Dieter,

Thanks again.

This would be a daily load. The table is 26GB with 27M records intially and could go upto 200M. I have about 1.6TB in spool. I've tried both the options above and they both run out of spool even on the 27M records table.

-Kim.

Junior Contributor

Re: Vertical pivoting

Hi Kim,

what's the exact definition of that table and how many values are returned per row in average?

Can you show the current explain of both versions?

Enthusiast

Re: Vertical pivoting

Hi Dieter,

Here is the table structure and the explain plans. Each record on the CAT_ID can have upto 250 values separated by a delimiter.

 CREATE SET TABLE common.test ,FALLBACK ,

      NO BEFORE JOURNAL,

      NO AFTER JOURNAL,

      CHECKSUM = DEFAULT,

      DEFAULT MERGEBLOCKRATIO

      (

       ID INT,

       CAT_ID VARCHAR(8000) CHARACTER SET LATIN NOT CASESPECIFIC

       )

 PRIMARY INDEX (ID );

2nd solution Explanation

=================

  1) First, we lock a distinct common."pseudo table" for read on

     a RowHash to prevent global deadlock for common.test.

  2) Next, we lock common.test for read.

  3) We do an all-AMPs RETRIEVE step from common.test by way of

     an all-rows scan with no residual conditions into Spool 3

     (all_amps), which is built locally on the AMPs.  The size of Spool

     3 is estimated with high confidence to be 27,735,081 rows (

     4,825,904,094 bytes).  The estimated time for this step is 27.51

     seconds.

  4) We do an all-AMPs RETRIEVE step from Spool 3 by way of an all-rows

     scan with a condition of ("NOT (cte.ID IS NULL)") into Spool 2

     (all_amps), which is built locally on the AMPs.  The size of Spool

     2 is estimated with no confidence to be 27,735,081 rows (

     4,825,904,094 bytes).  The estimated time for this step is 1.57

     seconds.

  5) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of

     an all-rows scan with a condition of ("NOT

     (TD_RECURSIVE_REF.ID IS NULL)") into Spool 4 (all_amps), which

     is redistributed by the hash code of (ID) to all AMPs.  Then

     we do a SORT to order Spool 4 by row hash.  The size of Spool 4 is

     estimated with no confidence to be 27,735,081 rows (1,553,164,536

     bytes).  The estimated time for this step is 1.37 seconds.

  6) We do an all-AMPs JOIN step from Spool 4 (Last Use) by way of a

     RowHash match scan, which is joined to common.s by way of a

     RowHash match scan with no residual conditions.  Spool 4 and

     common.s are joined using a merge join, with a join

     condition of ("(ID = common.s.ID) AND (((START_POS

     + LEN )+ 1 )<= (CHARACTERS(common.s.CAT_ID )))").  The

     result goes into Spool 5 (all_amps), which is built locally on the

     AMPs.  The size of Spool 5 is estimated with no confidence to be

     39,962,388 rows (6,953,455,512 bytes).  The estimated time for

     this step is 13.60 seconds.

  7) We do an all-AMPs RETRIEVE step from Spool 5 (Last Use) by way of

     an all-rows scan into Spool 3 (all_amps), which is built locally

     on the AMPs.  The size of Spool 3 is estimated with no confidence

     to be 2,025,854,481 rows (352,498,679,694 bytes).  The estimated

     time for this step is 2.28 seconds.  If one or more rows are

     inserted into spool 3, then go to step 4.

  8) We do an all-AMPs RETRIEVE step from Spool 2 (Last Use) by way of

     an all-rows scan with a condition of ("NOT (cte.ID IS NULL)")

     into Spool 7 (all_amps), which is redistributed by the hash code

     of (ID) to all AMPs.  Then we do a SORT to order Spool 7 by

     row hash.  The size of Spool 7 is estimated with no confidence to

     be 2,025,854,481 rows (113,447,850,936 bytes).  The estimated time

     for this step is 2 minutes and 21 seconds.

  9) We do an all-AMPs JOIN step from common.s by way of a

     RowHash match scan with no residual conditions, which is joined to

     Spool 7 (Last Use) by way of a RowHash match scan.

     common.s and Spool 7 are joined using a merge join, with a

     join condition of ("ID = common.s.ID").  The result

     goes into Spool 6 (all_amps), which is built locally on the AMPs.

     Then we do a SORT to order Spool 6 by the sort key in spool field1

     (ID, start_pos).  The result spool file will not be cached in

     memory.  The size of Spool 6 is estimated with no confidence to be

     1,466,453,894 rows (4,155,930,335,596 bytes).  The estimated time

     for this step is 1 hour and 27 minutes.

 10) Finally, we send out an END TRANSACTION step to all AMPs involved

     in processing the request.

  -> The contents of Spool 6 are sent back to the user as the result of

     statement 1.  The total estimated time is 1 hour and 31 minutes.

1st solution Explanation

===============

   1) First, we lock a distinct common."pseudo table" for read on

      a RowHash to prevent global deadlock for common.test.

   2) Next, we lock common.test for read.

   3) We do an all-AMPs RETRIEVE step from common.test by way of

      an all-rows scan with no residual conditions into Spool 3

      (all_amps), which is built locally on the AMPs.  The size of Spool

      3 is estimated with high confidence to be 27,735,081 rows (

      596,637,062,472 bytes).  The estimated time for this step is 27.51

      seconds.

   4) We do an all-AMPs RETRIEVE step from Spool 3 by way of an all-rows

      scan into Spool 2 (all_amps), which is built locally on the AMPs.

      The size of Spool 2 is estimated with no confidence to be

      27,735,081 rows (596,637,062,472 bytes).  The estimated time for

      this step is 3 minutes and 13 seconds.

   5) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of

      an all-rows scan with a condition of ("TD_RECURSIVE_REF.REMAINING

      <> ''") into Spool 4 (all_amps), which is built locally on the

      AMPs.  The size of Spool 4 is estimated with no confidence to be

      27,735,081 rows (596,637,062,472 bytes).  The estimated time for

      this step is 24.12 seconds.

   6) We do an all-AMPs RETRIEVE step from Spool 4 (Last Use) by way of

      an all-rows scan into Spool 3 (all_amps), which is built locally

      on the AMPs.  The size of Spool 3 is estimated with no confidence

      to be 1,414,489,131 rows (30,428,490,186,072 bytes).  The

      estimated time for this step is 3 minutes and 13 seconds.  If one

      or more rows are inserted into spool 3, then go to step 4.

   7) We do an all-AMPs RETRIEVE step from Spool 2 (Last Use) by way of

      an all-rows scan into Spool 5 (all_amps), which is built locally

      on the AMPs.  The result spool file will not be cached in memory.

      The size of Spool 5 is estimated with no confidence to be

      1,414,489,131 rows (3,854,482,881,975 bytes).  The estimated time

      for this step is 20 minutes and 29 seconds.

   8) Finally, we send out an END TRANSACTION step to all AMPs involved

      in processing the request.

   -> The contents of Spool 5 are sent back to the user as the result of

      statement 1.  The total estimated time is 27 minutes and 46

      seconds.

Junior Contributor

Re: Vertical pivoting

Hi Kim,

"up to 250", lets assume only 25 in average * 27,000,000 = 675,000,000 rows, when it's finally up to 8 times as much you get a few billion rows a day, ouch.

But this should never run into a "no more spool" when there's 1.6 TB.

The table is defined with a NUPI instead of a UPI, did you check if there are duplicate IDs? Of course this would burst any spool.

If you need to run this as a daily load you should really try to implement a table-UDF splitting it into rows.