Spool Space Issue on More Projected Columns Than Lesser Projected Columns

Database
Win
Teradata Employee

Spool Space Issue on More Projected Columns Than Lesser Projected Columns

What could be the cause of this?

I have the following two queries with similar source views and joins:

FIRST QUERY

The first one has more columns projected but is not returning a spool space issue: (EXPLAIN plan provided below - 5 steps)

SEL      D.PrimaryIndex

    , NonKeyColumn1, NonKeyColumn2, NonKeyColumn3, NonKeyColumn4, NonKeyColumn5

    , NonKeyColumn6, NonKeyColumn7, NonKeyColumn8, NonKeyColumn9, NonKeyColumn10

    , NonKeyColumn11, NonKeyColumn12, NonKeyColumn13, NonKeyColumn14, NonKeyColumn15

    , NonKeyColumn16, NonKeyColumn17, NonKeyColumn18, NonKeyColumn19, NonKeyColumn20

    , NonKeyColumn21, NonKeyColumn22, NonKeyColumn23, NonKeyColumn24, NonKeyColumn25

    , NonKeyColumn26, NonKeyColumn27, NonKeyColumn28, NonKeyColumn29, NonKeyColumn30

    , NonKeyColumn31, NonKeyColumn32, NonKeyColumn33, NonKeyColumn34, NonKeyColumn35

    , NonKeyColumn36, NonKeyColumn37, NonKeyColumn38, NonKeyColumn39, NonKeyColumn40

    , NonKeyColumn41, NonKeyColumn42, NonKeyColumn43, NonKeyColumn44, NonKeyColumn45

    , NonKeyColumn46, NonKeyColumn47, NonKeyColumn48, NonKeyColumn49, NonKeyColumn50

    , NonKeyColumn51

    , CASE WHEN D.NonKeyColumn52 NOT IN ('A', 'B')

                AND F.NonKeyColumn52 NOT LIKE '%A%'

                AND F.NonKeyColumn52 NOT LIKE '%B%'

            THEN 'A'

        ELSE D.NonKeyColumn52

      END NonKeyColumn52

    , NonKeyColumn53

    , CASE WHEN COALESCE(NonKeyColumn54,'A') = 'A'

            THEN 'A'

        ELSE 'B'

      END NonKeyColumn54

FROM VIEW1 D

INNER JOIN VIEW2 X

    ON D.PrimaryIndex = X.PrimaryIndex

LEFT JOIN VIEW3 F

    ON X.NotIndex = F.PrimaryIndex

;

SECOND QUERY

The second query has lesser columns projected but is returning a spool space issue: (EXPLAIN plan provided below - 4 steps)

SEL D.PrimaryIndex

    , NonKeyColumn1, NonKeyColumn2, NonKeyColumn3, NonKeyColumn4, NonKeyColumn5

    , NonKeyColumn6, NonKeyColumn7, NonKeyColumn8, NonKeyColumn9, NonKeyColumn10

    , CASE WHEN D.NonKeyColumn11 NOT IN ('A', 'B')

                AND F.NonKeyColumn11 NOT LIKE '%A%'

                AND F.NonKeyColumn11 NOT LIKE '%B%'

            THEN 'A'

        ELSE D.NonKeyColumn11

      END NonKeyColumn12

    , CASE WHEN COALESCE(NonKeyColumn54,'A') = 'A'

            THEN 'A'

        ELSE 'B'

      END NonKeyColumn54

FROM VIEW1 D

INNER JOIN VIEW2 X

    ON D.PrimaryIndex = X.PrimaryIndex

LEFT JOIN VIEW3 F

    ON X.NotIndex = F.PrimaryIndex

;

EXPLAIN PLAN FOR QUERY1

  1) First, we lock TABLE1 in view

     VIEW1 for access, we lock

     TABLE2 in view VIEW2 for

     access, and we lock TABLE3 in view

     VIEW3 for access.

  2) Next, we execute the following steps in parallel.

       1) We do an all-AMPs RETRIEVE step from

          TABLE3 in view VIEW 3 by way of an

          all-rows scan with a condition of ("NOT

          (TABLE3 in view VIEW3.PrimaryIndex IS NULL)")

          into Spool 2 (all_amps) (compressed columns allowed) fanned

          out into 2 hash join partitions, which is built locally on

          the AMPs.  The size of Spool 2 is estimated with high

          confidence to be 2,001,784 rows (390,347,880 bytes).  The

          estimated time for this step is 4.14 seconds.

       2) We do an all-AMPs RETRIEVE step from TABLE2

          in view VIEW2 by way of an all-rows scan with a condition of

          ("NOT (TABLE2 in view VIEW2.PrimaryIndex IS NULL)") into

          Spool 3 (all_amps) (compressed columns allowed) fanned out

          into 2 hash join partitions, which is redistributed by the

          hash code of (TABLE2.ColumnWithCompress) to all AMPs.

          The size of Spool 3 is estimated with high confidence to be

          102,578,207 rows (4,000,550,073 bytes).  The estimated time

          for this step is 20.75 seconds.

  3) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an

     all-rows scan, which is joined to Spool 3 (Last Use) by way of an

     all-rows scan.  Spool 2 and Spool 3 are right outer joined using a

     hash join of 2 partitions, with condition(s) used for non-matching

     on right table ("NOT (NotIndex IS NULL)"), with a join condition of

     ("NotIndex = PrimaryIndex").  The result goes into Spool 4 (all_amps)

     (compressed columns allowed), which is redistributed by the hash

     code of (TABLE2.PrimaryIndex) to all AMPs.  Then we do

     a SORT to order Spool 4 by row hash.  The result spool file will

     not be cached in memory.  The size of Spool 4 is estimated with

     index join confidence to be 237,347,366 rows (48,656,210,030

     bytes).  The estimated time for this step is 3 minutes and 58

     seconds.

  4) We do an all-AMPs JOIN step from TABLE1 in view

     VIEW1 by way of a RowHash match scan, which

     is joined to Spool 4 (Last Use) by way of a RowHash match scan.

     TABLE1 and Spool 4 are joined using a merge join,

     with a join condition of ("TABLE1.PrimaryIndex = PrimaryIndex").

     The result goes into Spool 1 (group_amps), which is built locally

     on the AMPs.  The result spool file will not be cached in memory.

     The size of Spool 1 is estimated with index join confidence to be

     316,463,155 rows (358,552,754,615 bytes).  The estimated time for

     this step is 6 minutes and 10 seconds.

  5) Finally, we send out an END TRANSACTION step to all AMPs involved

     in processing the request.

  -> The contents of Spool 1 are sent back to the user as the result of

     statement 1.  The total estimated time is 10 minutes and 28

     seconds.

EXPLAIN PLAN FOR QUERY2

  1) First, we lock TABLE1 in view VIEW1 for access, we lock

     TABLE2 in view VIEW2 for access, and we lock TABLE3 in view

     VIEW3 for access.

  2) Next, we execute the following steps in parallel.

       1) We do an all-AMPs RETRIEVE step from

          TABLE3 in view VIEW3 by way of an

          all-rows scan with a condition of ("NOT

          (TABLE3 in view VIEW3.PrimaryIndex IS NULL)")

          into Spool 2 (all_amps) (compressed columns allowed) fanned

          out into 2 hash join partitions, which is built locally on

          the AMPs.  The size of Spool 2 is estimated with high

          confidence to be 2,001,784 rows (390,347,880 bytes).  The

          estimated time for this step is 4.14 seconds.

       2) We do an all-AMPs JOIN step from TABLE2 in

          view VIEW2 by way of a RowHash match

          scan, which is joined to TABLE1 in view

          VIEW1 by way of a RowHash match scan.

          TABLE2 and TABLE1 are joined using a merge join,

          with a join condition of (

          "TABLE1.PrimaryIndex = TABLE2.PrimaryIndex").

          The result goes into Spool 3 (all_amps)

          (compressed columns allowed) fanned out

          into 2 hash join partitions, which is redistributed by the

          hash code of (TABLE2.ColumnWithCompress) to all AMPs.

          The size of Spool 3 is estimated with low confidence to be

          136,770,943 rows (17,780,222,590 bytes).  The estimated time

          for this step is 59.29 seconds.

  3) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an

     all-rows scan, which is joined to Spool 3 (Last Use) by way of an

     all-rows scan.  Spool 2 and Spool 3 are right outer joined using a

     hash join of 2 partitions, with condition(s) used for non-matching

     on right table ("NOT (NotIndex IS NULL)"), with a join condition of

     ("NotIndex = PrimaryIndex").  The result goes into Spool 1

     (group_amps), which is built locally on the AMPs.  The result

     spool file will not be cached in memory.  The size of Spool 1 is

     estimated with index join confidence to be 317,130,417 rows (

     47,569,562,550 bytes).  The estimated time for this step is 1

     minute and 26 seconds.

  4) Finally, we send out an END TRANSACTION step to all AMPs involved

     in processing the request.

  -> The contents of Spool 1 are sent back to the user as the result of

     statement 1.  The total estimated time is 2 minutes and 26 seconds.

Thank you in advance for any reply.

5 REPLIES
Enthusiast

Re: Spool Space Issue on More Projected Columns Than Lesser Projected Columns

Are your statistics current and complete?

Rglass

Win
Teradata Employee

Re: Spool Space Issue on More Projected Columns Than Lesser Projected Columns

Hi Rglass,

I think that the statistics are not complete because of the "low confidence" shown in the EXPLAIN of the 2nd query; but I am not the one who is collecting the statistics nor do I have privilege to COLLECT STATS.

Thank you.

Best regards,

Win

Win
Teradata Employee

Re: Spool Space Issue on More Projected Columns Than Lesser Projected Columns

I just noticed the the title on my subject is wrong. I cannot edit it; but it should be: "Spool Space Issue on LESSER Projected Columns than on More Projected Columns".

Teradata Employee

Re: Spool Space Issue on More Projected Columns Than Lesser Projected Columns

A close look at the explain tells the story. In query 1, the content of Table1 participating in the query is much bigger since there are so many more columns included from it. The optimizer leaves Table1 in place and performs the other join first, then joins to the in place Table1. In query 2, since so much of Table1 is not participating, the optimizer can afford to do a direct merge join to Table1 and spool the result because so many columns will be projected out in the spool.

It seems like one of two things is happening however. Since the optimizer thinks that the spool for query2 is significantly smaller, either the statistics are not complete/current so the optimizer is not estimating correctly, or there is a lot of skew in query2 - probably on Table2.ColumnWithCompress on which spool 3 is redistributed. 

Statistics on Table2.columnwithcompress may help the optimizer avoid the skew. Current stats on the PI columns being joined may help with estimates of the join size. It would be good to start by investigating if the estimates for the number of rows resulting from the join are correct and if there is a lot of skew in the above column in the join result of Tab;e2 and Table1.

Win
Teradata Employee

Re: Spool Space Issue on More Projected Columns Than Lesser Projected Columns

Hi Todd,

Thank you very much for your input.

Best regards,

Win