Query performance issue

General
Fan

Query performance issue

Below SQL is giving performance issues. Is there any way we can rewrite this query to improve the performance?

SELECT

       count(distinct(TBL_OUTPUT.C_CID)) as CNT

FROM

       (      SELECT

                     TBL_INPUT.GUID,

                     TBL_INPUT.PSN,

                     TBL_INPUT.MID,

                     (CASE

                           WHEN TBL_INPUT.GUID is NULL then TBL_INPUT.CID

                           ELSE

                                  (      SELECT

                                                min(TBL.CID)

                                         FROM

                                                DB1.TAB1 AS TBL

                                         WHERE

                                                TBL.GUID = TBL_INPUT.GUID

                                  )

                     END) as C_CID

              FROM

                     DB1.TAB1 as TBL_INPUT

       ) TBL_OUTPUT ;

12 REPLIES
Teradata Employee

Re: Query performance issue

Hi,

you can try to use a join instead a correlated query, i.e. something like this:


SELECT

       count(distinct(TBL_OUTPUT.C_CID)) as CNT

FROM

       (      SELECT

                     TBL_INPUT.GUID,

                     TBL_INPUT.PSN,

                     TBL_INPUT.MID,

                     (CASE

                           WHEN TBL_INPUT.GUID is NULL then TBL_INPUT.CID

                           ELSE

                                      TBL.CID

                     END) as C_CID

              FROM

                     DB1.TAB1 as TBL_INPUT

              LEFT OUTER JOIN

                                  (      SELECT

                                                TBL2.GUID, min(TBL2.CID) as CID

                                         FROM

                                                DB1.TAB1 AS TBL2

                                         GROUP BY TBL2.GUID                                                

                                  ) as TBL

             ON TBL.GUID = TBL_INPUT.GUID

       ) TBL_OUTPUT ;

Regards,

Vlad.

Junior Contributor

Re: Query performance issue

As you access the same table twice you might also try a Windowed Aggregate Function instaed of a join:

SELECT
count(distinct(TBL_OUTPUT.C_CID)) as CNT
FROM
( SELECT
TBL_INPUT.GUID,
TBL_INPUT.PSN,
TBL_INPUT.MID,
(CASE
WHEN TBL_INPUT.GUID is NULL then TBL_INPUT.CID
ELSE MIN(TBL_INPUT.CID)
OVER (PARTITION BY GUID)
END) as C_CID
FROM
DB1.TAB1 as TBL_INPUT
) TBL_OUTPUT ;
Enthusiast

Re: Query performance issue

Hi Dieter,

I need a big favour from you...

I just implemented Soft RI in teradata as in SQL Server (there is a Hard RI), and all of suddenly all Soft RI based Inserts and Updates working longer window, and it is taking huge take.

i really do not understand, why it is taking such a longer time. I observed all RI tables checking each other during Join, which results Updates and Inserts running longer time. Do you have any tip to implement the same in BTEQ applicate side.

Below is my query and DDL.

SELECT

         I.DW_ACTIVE,

         CASE WHEN WK.DW_EFF_FROM=I.DW_EFF_FROM THEN 'X' ELSE 'N' END,

          I.DW_EFF_TO,

         CASE WHEN WK.DW_EFF_FROM=I.DW_EFF_FROM THEN WK.DW_EFF_FROM ELSE (WK.DW_EFF_FROM - INTERVAL '1' DAY) END,

          I.DW_DATE_UPDATED ,

         CURRENT_TIMESTAMP(0)

 FROM EIW_INTEGRATION.a I,EIW_STAGE_VIEWS.b WK

WHERE I.ORDER_LINE_ID=WK.ORDER_LINE_ID

AND I.ORDER_ROLE_TYPE_CDE=WK.ORDER_ROLE_TYPE_CDE

AND I.DW_EFF_FROM=WK.DW_EFF_FROM

AND I.DW_EFF_SEQ=WK.DW_EFF_SEQ

AND I.DW_ACTIVE='Y'

AND I.PARTY_ID=I.PARTY_ID;

INSERT INTO EIW_INTEGRATION.a

(

ORDER_LINE_ID,

ORDER_ROLE_TYPE_CDE,

DW_EFF_FROM,

DW_EFF_SEQ,

DW_EFF_TO,

DW_ACTIVE,

DW_RECORD_ID,

DW_BATCH_ID,

DW_DATE_CREATED,

DW_DATE_UPDATED,

PARTY_ID,

PARTY_ATTRIBUTE_ID,

COMMENTS

)

SELECT

WK.ORDER_LINE_ID,

WK.ORDER_ROLE_TYPE_CDE,

WK.DW_EFF_FROM,

WK.DW_EFF_SEQ,

WK.DW_EFF_TO,

WK.DW_ACTIVE,

WK.DW_RECORD_ID,

WK.DW_BATCH_ID,

WK.DW_DATE_CREATED,

WK.DW_DATE_UPDATED,

WK.PARTY_ID,

WK.PARTY_ATTRIBUTE_ID,

COMMENTS

FROM EIW_STAGE_VIEWS.b WK;

CREATE SET TABLE EIW_INTEGRATION.a ,NO FALLBACK ,

     NO BEFORE JOURNAL,

     NO AFTER JOURNAL,

     CHECKSUM = DEFAULT,

     DEFAULT MERGEBLOCKRATIO

     (

      ORDER_LINE_ID VARCHAR(20) CHARACTER SET LATIN NOT CASESPECIFIC NOT NULL DEFAULT ' ',

      ORDER_ROLE_TYPE_CDE INTEGER NOT NULL,

      DW_EFF_FROM DATE FORMAT 'YYYY-MM-DD' NOT NULL,

      DW_EFF_SEQ INTEGER NOT NULL,

      DW_EFF_TO DATE FORMAT 'YYYY-MM-DD' NOT NULL,

      DW_ACTIVE CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC NOT NULL DEFAULT ' ',

      DW_RECORD_ID INTEGER NOT NULL,

      DW_BATCH_ID INTEGER NOT NULL,

      DW_DATE_CREATED TIMESTAMP(0) NOT NULL,

      DW_DATE_UPDATED TIMESTAMP(0) NOT NULL,

      PARTY_ID INTEGER NOT NULL,

      PARTY_ATTRIBUTE_ID VARCHAR(20) CHARACTER SET LATIN NOT CASESPECIFIC NOT NULL DEFAULT ' ',

      COMMENTS VARCHAR(255) CHARACTER SET LATIN NOT CASESPECIFIC NOT NULL DEFAULT ' ',

CONSTRAINT FK1_PARTY_ORDER_LINE_ROLE FOREIGN KEY ( ORDER_ROLE_TYPE_CDE ) REFERENCES WITH NO CHECK OPTION EIW_INTEGRATION.ORDER_ROLE_TYPE ( ORDER_ROLE_TYPE_CDE ),

CONSTRAINT FK2_PARTY_ORDER_LINE_ROLE FOREIGN KEY ( ORDER_LINE_ID ) REFERENCES WITH NO CHECK OPTION EIW_INTEGRATION.ORDER_LINE ( ORDER_LINE_ID ),

CONSTRAINT FK3_PARTY_ORDER_LINE_ROLE FOREIGN KEY ( PARTY_ID ) REFERENCES WITH NO CHECK OPTION EIW_INTEGRATION.PARTY ( PARTY_ID ))

PRIMARY INDEX ( PARTY_ID )

PARTITION BY CASE_N(

DW_ACTIVE =  'X',

DW_ACTIVE =  'Y',

DW_ACTIVE =  'N',

DW_ACTIVE =  'D')

INDEX ( ORDER_ROLE_TYPE_CDE );

WORK TABLE : EIW_STAGE_VIEWS.b PI COLUMN(PARTY_ID);

Junior Contributor

Re: Query performance issue

If there's Soft RI there shouldn'tt be any kind of check.

Can you show the Explain?

Enthusiast

Re: Query performance issue

Please see the explain plan for Insert: I do not understood why a simple Insert takes hours of time. My observation was - SoftRI is validating the data across all of tables. Same applies to Update: see the below..

How do i implement SoftRI on BTEQ Application side...?. Can you help me on this..

--Explain plan for Update:

1) First, we lock a distinct EIW_INTEGRATION."pseudo table" for write

     on a RowHash to prevent global deadlock for

     EIW_INTEGRATION.PARTY_WORK_EFFORT_ROLE.

  2) Next, we lock EIW_INTEGRATION.PARTY_WORK_EFFORT_ROLE for write,

     and we lock EIW_STAGE.PARTY_WORK_EFFORT_ROLE_W1 in view

     EIW_STAGE_VIEWS.PARTY_WORK_EFFORT_ROLE_W1 for access.

  3) We do an all-AMPs RETRIEVE step from

     EIW_STAGE.PARTY_WORK_EFFORT_ROLE_W1 in view

     EIW_STAGE_VIEWS.PARTY_WORK_EFFORT_ROLE_W1 by way of an all-rows

     scan with no residual conditions into Spool 2 (all_amps), which is

     built locally on the AMPs.  Then we do a SORT to partition Spool 2

     by rowkey.  The size of Spool 2 is estimated with high confidence

     to be 3,212,349 rows (109,219,866 bytes).  The estimated time for

     this step is 0.50 seconds.

  4) We do an all-AMPs JOIN step from 6 partitions of

     EIW_INTEGRATION.PARTY_WORK_EFFORT_ROLE by way of a RowHash match

     scan with a condition of (

     "EIW_INTEGRATION.PARTY_WORK_EFFORT_ROLE.DW_ACTIVE = 'Y'"), which

     is joined to Spool 2 (Last Use) by way of a RowHash match scan.

     EIW_INTEGRATION.PARTY_WORK_EFFORT_ROLE and Spool 2 are joined

     using a rowkey-based merge join, with a join condition of (

     "(EIW_INTEGRATION.PARTY_WORK_EFFORT_ROLE.PARTY_ID = PARTY_ID) AND

     ((EIW_INTEGRATION.PARTY_WORK_EFFORT_ROLE.WORK_EFFORT_ID =

     WORK_EFFORT_ID) AND

     ((EIW_INTEGRATION.PARTY_WORK_EFFORT_ROLE.WORK_EFFORT_ROLE_TYPE_CDE

     = WORK_EFFORT_ROLE_TYPE_CDE) AND

     ((EIW_INTEGRATION.PARTY_WORK_EFFORT_ROLE.DW_EFF_FROM = DW_EFF_FROM)

     AND (EIW_INTEGRATION.PARTY_WORK_EFFORT_ROLE.DW_EFF_SEQ =

     DW_EFF_SEQ ))))").  The result goes into Spool 1 (all_amps), which

     is built locally on the AMPs.  Then we do a SORT to order Spool 1

     by the sort key in spool field1 (

     EIW_INTEGRATION.PARTY_WORK_EFFORT_ROLE.ROWID).  The size of Spool

     1 is estimated with low confidence to be 3,212,349 rows (

     70,671,678 bytes).  The estimated time for this step is 9.80

     seconds.

  5) We do an all-AMPs MERGE DELETE to

     EIW_INTEGRATION.PARTY_WORK_EFFORT_ROLE from Spool 1 (Last Use) via

     the row id.  New updated rows are built and the result goes into

     Spool 3 (all_amps), which is built locally on the AMPs.  Then we

     do a SORT to partition Spool 3 by rowkey.  The size is estimated

     with low confidence to be 3,212,349 rows.  The estimated time for

     this step is 32 minutes and 54 seconds.

  6) We do an all-AMPs MERGE into

     EIW_INTEGRATION.PARTY_WORK_EFFORT_ROLE from Spool 3 (Last Use).

     The size is estimated with low confidence to be 3,212,349 rows.

     The estimated time for this step is 2.96 seconds.

  7) Finally, we send out an END TRANSACTION step to all AMPs involved

     in processing the request.

  -> No rows are returned to the user as the result of statement 1.

     The total estimated time is 33 minutes and 7 seconds.

--Explain plan for Insert/Select:

  1) First, we lock a distinct EIW_INTEGRATION."pseudo table" for write

     on a RowHash to prevent global deadlock for

     EIW_INTEGRATION.PARTY_WORK_EFFORT_ROLE.

  2) Next, we lock EIW_INTEGRATION.PARTY_WORK_EFFORT_ROLE for write,

     and we lock EIW_STAGE.PARTY_WORK_EFFORT_ROLE_W1 in view

     EIW_STAGE_VIEWS.PARTY_WORK_EFFORT_ROLE_W1 for access.

  3) We do an all-AMPs RETRIEVE step from

     EIW_STAGE.PARTY_WORK_EFFORT_ROLE_W1 in view

     EIW_STAGE_VIEWS.PARTY_WORK_EFFORT_ROLE_W1 by way of an all-rows

     scan with no residual conditions into Spool 1 (all_amps), which is

     built locally on the AMPs.  Then we do a SORT to partition Spool 1

     by rowkey.  The size of Spool 1 is estimated with high confidence

     to be 3,212,349 rows (517,188,189 bytes).  The estimated time for

     this step is 1.90 seconds.

  4) We do an all-AMPs MERGE into

     EIW_INTEGRATION.PARTY_WORK_EFFORT_ROLE from Spool 1 (Last Use).

     The size is estimated with high confidence to be 3,212,349 rows.

     The estimated time for this step is 2.96 seconds.

  5) We spoil the parser's dictionary cache for the table.

  6) Finally, we send out an END TRANSACTION step to all AMPs involved

     in processing the request.

  -> No rows are returned to the user as the result of statement 1.

Junior Contributor

Re: Query performance issue

Ok, there's no step indicating an RI check, this is only visible when using WITH CHECK OPTION.

You implemented Soft RI (WITH NO CHECK OPTION) and since that time it's slower?

Did you change anything else? 

Is there QueryLog (preferably Step level or Explain) data for the previous runs to compare?

Enthusiast

Re: Query performance issue

Yes, you are right.

I did not change anyting, and even was tested the code without SOFT RI. It is working very quickly.

Because of this Soft RI across multiple tables, it is internally validate data during joins and while inserts it does validate the data. How do i implement SoftRI in BTEQ application job ?.

Junior Contributor

Re: Query performance issue

Soft RI (WITH NO CHECK OPTION) does not validate any referential integrity.

WITH CHECK OPTION validates on a statement level doing a kind of [NOT] EXISTS correlated subquery, but you can spot it in Explain. This is how it's usually done on application side, check for valid/invalid data before you actuall Insert/Update/Delete.

Standard RI (neither CHECK nor NO CHECK) validates on a row level which is very bad for large Insert/Selects or Updates, you can't see that in Explain.

In your case, when you defined NO CHECK OPTION (you should double check it) and it's still slower than before you should open an incident with Teradata's customer support.

Enthusiast

Re: Query performance issue

Question on above comment...

You mean...Standard RI means (Soft RI):

Standard RI (neither CHECK nor NO CHECK) validates on a row level which is very bad for large Insert/Selects or Updates, you can't see that in Explain.

You know the issue is....Skew:

I did change the PI columns based on join access freq. Now, all Inserts/select,Updates with joins are running well.

/* Comment about Soft RI: */

I completely removed SOFT RI (RI check) in our Teradata database side, and even i was tested few examples..SOFT RI looks dummy, and it is not throughing any error during joins (NOT loading time).

Please correct me...

How do i check this RI at BTEQ level... shall i go ahead for EXISTS AND ACTICITY COUNT.. check..