Decrease Time to Run Large UNION Statement

Database
TDW
Enthusiast

Decrease Time to Run Large UNION Statement

The following code takes approx 13-14 hours to run.

Is there anyway to improve this on the coding side, before I approach our DBA's?

Here are the sizes (number of rows) for each of the tables be used.

This first section of code runs in 1-2 minutes.

TSTG.ALLQ = 20,836,928 rows

CREATE TABLE TSTG.ALLQ
(  PayerCode CHAR(1)
  ,MainID CHAR(12)
  ,ID1a CHAR(9)
  ,ID1b DEC(3,0)
  ,StDate DATE
  ,EnDate DATE
  ,PayerCode CHAR(6)
  ,RFC CHAR(1)
  ,BeginPeriod CHAR(5)
  ,EndPeriod CHAR(5)
  ,Fnd CHAR(1)
) PRIMARY INDEX (MainID,PayerCode)  
;

INSERT INTO TSTG.ALLQ
 ( PayerCode
  ,MainID
  ,ID1a
  ,ID1b
  ,StDate
  ,EnDate
  ,PayerCode
  ,RFC
  ,BeginPeriod
  ,EndPeriod
 )
 SELECT
   PayerCode
   ,ID1a || CAST(CAST(ID1b AS FORMAT'9(3)') AS CHAR(3)) AS MainID
   ,ID1a
   ,ID1b
   ,StDate
   ,EnDate
   ,PayerCode
   ,RFC
   ,BeginPeriod
   ,EndPeriod
  FROM TSTG.ELIG
   INNER JOIN TSTG.AQ ON 1=1
   WHERE RFC = 'Q'
;

This next section takes over 13 hours to complete.

TPRD.TMA = 12,170,323 rows

TPRD.GT = 12,787,182 rows

DROP TABLE TSTG.AGMAQ
;
CREATE TABLE TSTG.AGMAQ
( TTC CHAR(6)
 ,ID2a CHAR(9)
 ,ID2b CHAR(3)
 ,GBegin CHAR(5)
 ,GEnd CHAR(5)
)
PRIMARY INDEX (TTC, ID2a, ID2b, GBegin , GEnd)
;
INSERT INTO TSTG.AGMAQ
(TTC, ID2a, ID2b, GBegin , GEnd)
    SELECT DISTINCT
      E1.PayerCode,
      E1.ID1a,
      E1.ID1b,
      MA.GABegin,
      MA.GAEnd
     FROM TSTG.ALLQ AS E1
      LEFT JOIN TPRD.TMA AS MA
       ON E1.PayerCode = MA.PayerCode
       AND E1.ID1a = MA.ID1a
       AND E1.ID1b = MA.ID1b
      LEFT JOIN TSTG.TSTG.ATypeCodes AS CODES
       ON E1.PayerCode = CODES.TTC
       WHERE MA.ID1a IS NOT NULL
        AND (MA.GABegin <= E1.BeginPeriod AND MA.GAEnd >= E1.EndPeriod)
        AND MA.Rtrn_Typ_Code = CODES.RTC
    UNION
    SELECT DISTINCT
      E1.PayerCode,
      E1.ID1a,
      E1.ID1b,
      GTr.GABegin,
      GTr.GAEnd
     FROM TSTG.ALLQ AS E1
      LEFT JOIN TPRD.GT AS GTr /*gcj*/
       ON E1.PayerCode = GTr.PayerCode
        AND E1.ID1a = GTr.ID1a
        AND E1.ID1b = GTr.ID1b
      LEFT JOIN TSTG.ACodeTypes AS CODES
       ON E1.PayerCode = CODES.TTC
      WHERE GTr.ID1a IS NOT NULL
        AND (GTr.GABegin <= E1.BeginPeriod AND GTr.GAEnd >= E1.EndPeriod)
        AND GTr.TCC = CODES.TCC
;

I have tried running each side of the select separately, and then running UNION between them. It improved very slightly.

I also tried to run as INNER JOIN, but I ended up with duplicates (each section as it is written).

Thanks in advance for any assistance you can provide on this.

God Bless,

Genesius

5 REPLIES
TDW
Enthusiast

Re: Decrease Time to Run Large UNION Statement

Update.

My mistake.

TPRD.TMA = 5,103,949,241     rows

TPRD.GT   =    204,469,116     rows

Thanks and God Bless,

Genesius

Enthusiast

Re: Decrease Time to Run Large UNION Statement

Can you try the below one?

INSERT INTO TSTG.AGMAQ

(TTC, ID2a, ID2b, GBegin , GEnd)

select PayerCode,

      ID1a,

      ID1b,

      GABegin,

      GAEnd

from

(

    SELECT 

      E1.PayerCode,

      E1.ID1a,

      E1.ID1b,

      MA.GABegin,

      MA.GAEnd

     FROM TSTG.ALLQ AS E1

      LEFT JOIN TPRD.TMA AS MA 

       ON E1.PayerCode = MA.PayerCode

       AND E1.ID1a = MA.ID1a 

       AND E1.ID1b = MA.ID1b

      LEFT JOIN TSTG.TSTG.ATypeCodes AS CODES

       ON E1.PayerCode = CODES.TTC

       WHERE MA.ID1a IS NOT NULL

        AND (MA.GABegin <= E1.BeginPeriod AND MA.GAEnd >= E1.EndPeriod)

        AND MA.Rtrn_Typ_Code = CODES.RTC

    UNION ALL

    SELECT 

      E1.PayerCode,

      E1.ID1a,

      E1.ID1b,

      GTr.GABegin,

      GTr.GAEnd

     FROM TSTG.ALLQ AS E1

      LEFT JOIN TPRD.GT AS GTr /*gcj*/

       ON E1.PayerCode = GTr.PayerCode

        AND E1.ID1a = GTr.ID1a 

        AND E1.ID1b = GTr.ID1b

      LEFT JOIN TSTG.ACodeTypes AS CODES

       ON E1.PayerCode = CODES.TTC

      WHERE GTr.ID1a IS NOT NULL

        AND (GTr.GABegin <= E1.BeginPeriod AND GTr.GAEnd >= E1.EndPeriod)

        AND GTr.TCC = CODES.TCC

) group by PayerCode,

      ID1a,

      ID1b,

      GABegin,

      GAEnd

;

Enthusiast

Re: Decrease Time to Run Large UNION Statement

/* Here if you see Explain plain both LEFT joins will be actually INNER joins,Since there are not null kind of filter 
conditions in WHERE clause */

CREATE TABLE TSTG.ALLQ
(  PayerCode CHAR(1)
  ,MainID CHAR(12)
  ,ID1a CHAR(9)
  ,ID1b DEC(3,0)
  ,StDate DATE
  ,EnDate DATE
  ,PayerCode CHAR(6)
  ,RFC CHAR(1)
  ,BeginPeriod CHAR(5)
  ,EndPeriod CHAR(5)
  ,Fnd CHAR(1),
  ,RTC datatype  --CHANGE THIS
  ,TCC datatype  --CHANGE THIS
) PRIMARY INDEX (MainID,PayerCode) 
;

--Part 2

INSERT INTO TSTG.ALLQ
 ( PayerCode
  ,MainID
  ,ID1a
  ,ID1b
  ,StDate
  ,EnDate
  ,PayerCode
  ,RFC
  ,BeginPeriod
  ,EndPeriod
  ,RTC
  ,TCC
 )
 SELECT
   PayerCode
   ,ID1a || CAST(CAST(ID1b AS FORMAT'9(3)') AS CHAR(3)) AS MainID
   ,ID1a
   ,ID1b
   ,StDate
   ,EnDate
   ,PayerCode
   ,RFC
   ,BeginPeriod
   ,EndPeriod
   ,CODES.RTC
   ,CODES.TCC
  FROM TSTG.ELIG
   INNER JOIN TSTG.AQ ON 1=1
  
   INNER JOIN   TSTG.TSTG.ATypeCodes AS CODES
       ON PayerCode = CODES.TTC
   WHERE RFC = 'Q'
;

DROP TABLE TSTG.AGMAQ
;
CREATE TABLE TSTG.AGMAQ
( TTC CHAR(6)
 ,ID2a CHAR(9)
 ,ID2b CHAR(3)
 ,GBegin CHAR(5)
 ,GEnd CHAR(5)
)
PRIMARY INDEX (TTC, ID2a, ID2b, GBegin , GEnd)
;
INSERT INTO TSTG.AGMAQ
(TTC, ID2a, ID2b, GBegin , GEnd)
    SELECT
      E1.PayerCode,
      E1.ID1a,
      E1.ID1b,
      MA.GABegin,
      MA.GAEnd
     FROM TSTG.ALLQ AS E1
      INNER  JOIN TPRD.TMA AS MA
       ON E1.PayerCode = MA.PayerCode
       AND E1.ID1a = MA.ID1a
       AND E1.ID1b = MA.ID1b
    AND MA.Rtrn_Typ_Code = E1.RTC
    AND (MA.GABegin <= E1.BeginPeriod AND MA.GAEnd >= E1.EndPeriod)
       WHERE MA.ID1a IS NOT NULL  --MEANS INNER JOIN
    GROUP BY 1,2,3,4,5
       UNION
    SELECT
      E1.PayerCode,
      E1.ID1a,
      E1.ID1b,
      GTr.GABegin,
      GTr.GAEnd
     FROM TSTG.ALLQ AS E1
      INNER JOIN TPRD.GT AS GTr /*gcj*/
       ON E1.PayerCode = GTr.PayerCode
        AND E1.ID1a = GTr.ID1a
        AND E1.ID1b = GTr.ID1b
  AND GTr.TCC = E1.TCC
  AND (GTr.GABegin <= E1.BeginPeriod AND GTr.GAEnd >= E1.EndPeriod)
      WHERE GTr.ID1a IS NOT NULL  --MEANS INNER JOIN
        GROUP BY 1,2,3,4,5
       
;
TDW
Enthusiast

Re: Decrease Time to Run Large UNION Statement

Both,

The GROUP BY will not work as I am not using Aggregate Functions.

balachandra,

Won't the UNION ALL give me duplicates?

Now I need code to delete the dups.

sravan4,

I ran your first change, upto "WHERE RFC = 'Q'". After 2 hours, that code had not completed. I did not get to run the other changes.

BTW, running the first part (up to the "WHERE RFC = 'Q'") as I have it written takes only 4 minutes. It is the second portion of code that takes 13 hours.

Thanks and God Bless,

Genesius

Enthusiast

Re: Decrease Time to Run Large UNION Statement

Hi TDW,

May be that join with   TSTG.TSTG.ATypeCodes is a costly one.How many records this table has got?

I just tried to reduce it to one time instead of previosuly two times joining that table.

Please post explain plan of the INSERT INTO TSTG.ALLQ query i have posted.

Thanks