Tuning the query

General

Tuning the query

Hi ,

I have a qry which is taking more cpu and skew is more. pls suggest how to tune this query.

Explain:

Explain SELECT

      WATR.BK_AS_PROJECT_TASK_NUM ,

      WATR.BK_AS_PROJECT_CD ,

      WATR.BK_AS_TASK_RSRC_EXPNDTR_TYP_CD ,

      WATR.BK_AS_RSRC_CISCO_WRKR_PRTY_KEY ,

      WATR.FISCAL_YEAR_WEEK_NUM_INT ,

      WATR.AS_TASK_RES_TIMECARD_HOURS ,

      WATR.AS_TASK_RES_LABOR_COST_USD_AMT ,

      WATR.AS_TASK_RES_LABOR_COST_PRJ_AMT ,

      WATR.AS_TASK_RES_LABOR_COST_LOC_AMT ,

      WATR.AS_RSRC_LABOR_RT_USD_AMT ,

      CURRENT_TIMESTAMP(0),

      USER ,

      CURRENT_TIMESTAMP(0),

      USER

     FROM WORKDB_dv4.WI_MT_AS_TIMECARD_RESOURCE WATR

     WHERE DML_TYPE = 'I'

AND (WATR.BK_AS_PROJECT_TASK_NUM ,

WATR.BK_AS_PROJECT_CD ,

WATR.BK_AS_TASK_RSRC_EXPNDTR_TYP_CD ,

      WATR.BK_AS_RSRC_CISCO_WRKR_PRTY_KEY ,

      WATR.FISCAL_YEAR_WEEK_NUM_INT )

NOT IN

( SELECT  MATR.BK_AS_PROJECT_TASK_NUM ,

      MATR. BK_AS_PROJECT_CD ,

      MATR.BK_AS_TASK_RSRC_EXPNDTR_TYP_CD ,

      MATR.BK_AS_RSRC_CISCO_WRKR_PRTY_KEY ,

      MATR.FISCAL_YEAR_WEEK_NUM_INT

       FROM ASBIVWDB_dv4.MT_AS_TIMECARD_RESOURCE MATR)

  1) First, we lock WORKDB_dv4.WATR for access, and we lock

     ASBIDB_DV4.MT_AS_TIMECARD_RESOURCE for access.

  2) Next, we do an all-AMPs SUM step to aggregate from WORKDB_dv4.WATR

     by way of an all-rows scan with a condition of (

     "WORKDB_dv4.WATR.DML_TYPE = 'I'").  Aggregate Intermediate Results

     are computed globally, then placed in Spool 5.

  3) We do an all-AMPs SUM step to aggregate from

     ASBIDB_DV4.MT_AS_TIMECARD_RESOURCE by way of an all-rows scan with

     no residual conditions.  Aggregate Intermediate Results are

     computed globally, then placed in Spool 7.

  4) We execute the following steps in parallel.

       1) We do an all-AMPs RETRIEVE step from Spool 7 (Last Use) by

          way of an all-rows scan into Spool 3 (all_amps) (compressed

          columns allowed), which is duplicated on all AMPs.

       2) We do an all-AMPs RETRIEVE step from Spool 5 (Last Use) by

          way of an all-rows scan into Spool 4 (all_amps) (compressed

          columns allowed), which is duplicated on all AMPs.

  5) We execute the following steps in parallel.

       1) We do an all-AMPs JOIN step from WORKDB_dv4.WATR by way of an

          all-rows scan with a condition of ("WORKDB_dv4.WATR.DML_TYPE

          = 'I'"), which is joined to

          ASBIDB_DV4.MT_AS_TIMECARD_RESOURCE by way of an all-rows scan

          with no residual conditions.  WORKDB_dv4.WATR and

          ASBIDB_DV4.MT_AS_TIMECARD_RESOURCE are joined using an

          exclusion merge join, with a join condition of (

          "(WORKDB_dv4.WATR.BK_AS_PROJECT_TASK_NUM =

          ASBIDB_DV4.MT_AS_TIMECARD_RESOURCE.BK_AS_PROJECT_TASK_NUM)

          AND ((WORKDB_dv4.WATR.BK_AS_PROJECT_CD =

          ASBIDB_DV4.MT_AS_TIMECARD_RESOURCE.BK_AS_PROJECT_CD) AND

          ((WORKDB_dv4.WATR.BK_AS_TASK_RSRC_EXPNDTR_TYP_CD =

          ASBIDB_DV4.MT_AS_TIMECARD_RESOURCE.BK_AS_TASK_RSRC_EXPNDTR_TYP_CD)

          AND ((WORKDB_dv4.WATR.BK_AS_RSRC_CISCO_WRKR_PRTY_KEY =

          ASBIDB_DV4.MT_AS_TIMECARD_RESOURCE.BK_AS_RSRC_CISCO_WRKR_PRTY_KEY)

          AND (WORKDB_dv4.WATR.FISCAL_YEAR_WEEK_NUM_INT =

          ASBIDB_DV4.MT_AS_TIMECARD_RESOURCE.FISCAL_YEAR_WEEK_NUM_INT

          ))))"), and null value information in Spool 4 and Spool 3.

          Skip this join step if null exists.  The result goes into

          Spool 2 (group_amps), which is built locally on the AMPs.

          The size of Spool 2 is estimated with no confidence to be

          184,396 rows (37,063,596 bytes).  The estimated time for this

          step is 0.99 seconds.

       2) We do an all-AMPs RETRIEVE step from WORKDB_dv4.WATR by way

          of an all-rows scan with a condition of (

          "WORKDB_dv4.WATR.DML_TYPE = 'I'") into Spool 9 (all_amps)

          (compressed columns allowed), which is built locally on the

          AMPs.  Then we do a SORT to order Spool 9 by the hash code of

          (WORKDB_dv4.WATR.BK_AS_PROJECT_TASK_NUM), and null value

          information in Spool 4 and Spool 3.  Skip this retrieve step

          if there is no null.  The size of Spool 9 is estimated with

          no confidence to be 184,396 rows (21,943,124 bytes).  The

          estimated time for this step is 0.33 seconds.

       3) We do an all-AMPs RETRIEVE step from

          ASBIDB_DV4.MT_AS_TIMECARD_RESOURCE by way of an all-rows scan

          with no residual conditions into Spool 10 (all_amps)

          (compressed columns allowed), which is duplicated on all AMPs.

          Then we do a SORT to order Spool 10 by the hash code of (

          ASBIDB_DV4.MT_AS_TIMECARD_RESOURCE.BK_AS_PROJECT_TASK_NUM).

          The result spool file will not be cached in memory, and null

          value information in Spool 4 and Spool 3.  Skip this retrieve

          step if there is no null.  The size of Spool 10 is estimated

          with low confidence to be 146,184,000 rows (6,870,648,000

          bytes).  The estimated time for this step is 39.67 seconds.

  6) We do an all-AMPs JOIN step from Spool 9 (Last Use) by way of an

     all-rows scan, which is joined to Spool 10 (Last Use) by way of an

     all-rows scanSpool 9 and Spool 10 are joined using an exclusion

     merge join, with a join condition of ("(BK_AS_PROJECT_TASK_NUM =

     BK_AS_PROJECT_TASK_NUM) AND ((BK_AS_PROJECT_CD = BK_AS_PROJECT_CD)

     AND ((BK_AS_TASK_RSRC_EXPNDTR_TYP_CD =

     BK_AS_TASK_RSRC_EXPNDTR_TYP_CD) AND

     ((BK_AS_RSRC_CISCO_WRKR_PRTY_KEY = BK_AS_RSRC_CISCO_WRKR_PRTY_KEY)

     AND (FISCAL_YEAR_WEEK_NUM_INT = FISCAL_YEAR_WEEK_NUM_INT ))))"),

     and null value information in Spool 4 (Last Use) and Spool 3 (Last

     Use).  Skip this join step if there is no null.  The result goes

     into Spool 2 (group_amps), which is built locally on the AMPs.

     The size of Spool 2 is estimated with no confidence to be 184,396

     rows (37,063,596 bytes).  The estimated time for this step is 0.99

     seconds.

  7) Finally, we send out an END TRANSACTION step to all AMPs involved

     in processing the request.

  -> The contents of Spool 2 are sent back to the user as the result of

     statement 1.

Tags (1)
25 REPLIES
N/A

Re: Tuning the query

First thing you should try is to rewrite the NOT IN on NULLable columns to a NOT EXISTS:

AND NOT EXISTS
(
SELECT * FROM ASBIVWDB_dv4.MT_AS_TIMECARD_RESOURCE MATR
WHERE WATR.BK_AS_PROJECT_TASK_NUM = MATR.BK_AS_PROJECT_TASK_NUM
AND WATR.BK_AS_PROJECT_CD = MATR. BK_AS_PROJECT_CD
AND WATR.BK_AS_TASK_RSRC_EXPNDTR_TYP_CD = MATR.BK_AS_TASK_RSRC_EXPNDTR_TYP_CD
AND WATR.BK_AS_RSRC_CISCO_WRKR_PRTY_KEY = MATR.BK_AS_RSRC_CISCO_WRKR_PRTY_KEY
AND WATR.FISCAL_YEAR_WEEK_NUM_INT = MATR.FISCAL_YEAR_WEEK_NUM_INT
)

This should change Explain to a single step doing an Exclusion Join on the PIs.

Dieter

N/A

Re: Tuning the query

Of course it's still All-AMP, could you show the new Explain and some info about the number of rows and Indexes/Statistics?

Dieter

N/A

Re: Tuning the query

This is already the most efficient plan you can get, a direct join using the PIs of both tables.

Btw, this Explain is the result of an EXISTS instead of NOT EXISTS.

Regarding skew, how skewed is that table?

Can you provide COUNT/MIN/AVG/MAX(CurrentPerm) for this table from dbc.TableSizeV?

HELP STATS ASBIDB_DV4.MT_AS_TIMECARD_RESOURCE INDEX NUPI_MT_AS_TIMECARD_RESOURCE;

will tell you about the rowcount, distinct values and max rows per value. Usually a max rows per value up to a few hundred is considered acceptable, otherwise you might consider a different PI.

But changing the PI might affect other queries in a positive or negative way, so you need some analysis to find out.

Dieter

N/A

Re: Tuning the query

If this is the SkewFactor from Teradata Administrator it's large (the max AMP got approx. 3 times the number of rows than the average AMP), but it's small table. What's the size of the other table?

How many AMPs do you have?

I usually run a query like this to get info about skew:

SELECT

  DatabaseName

  ,TableName

  ,CAST(SUM(currentperm) / 1024**3 AS DEC(9,2)) AS GB

  ,CAST((MAX(CurrentPerm) / AVG(CurrentPerm)) AS DEC(5,3))AS SkewFactor

  ,CAST((HASHAMP() + 1) * MAX(currentperm) / 1024**3 AS DEC(9,2)) AS GBSkewed

  ,HASHAMP() + 1 AS NumAMPs

FROM dbc.TableSizeV

WHERE DatabaseName = ...

AND TableName = ...

GROUP BY 1,2

Dieter

Re: Tuning the query

HELLO DIETER,

WILL YOU PLEASE SUGGEST ME ...AS MY  KNOWLEDGE I WILL CONCERN THE FOLLOWING STEPS

IF TABLE CONTAIN MULTISET .. SO IT ALLOWS THE DUPLICATES

SO TRY TO AVOID THE SKENESS "SET" ON CREATION OF TABLE...

AND PARIMARY INDEX ON MULTIPLE COLUMNS..

N/A

Re: Tuning the query

Hi Kalyan,

this table is not highly skewed, but number of rows per PI value might be high:

select

  hashrow(BK_AS_PROJECT_TASK_NUM , BK_AS_PROJECT_CD ,FISCAL_YEAR_WEEK_NUM_INT),

  count(*) as cnt

from MT_AS_TIMECARD_RESOURCE

group by 1

qualify row_number() over (order by cnt desc) <= 10

Regarding skew:

SET or MULTISET will not change skew (just avoid the overhead of duplicate row checking), only a different PI can help.

Dieter

Re: Tuning the query

CAN ANY ONE SEND ME A QUERY... I WILL TUNE THAT QUERY AS PER MY KNOWLEDGE

REGARDS,

MAHESH

Re: Tuning the query

You need to consider few things like number of records in the table, how many rows are there on average per index value etc. If there are two many rows against few of the index values then the skew factor will increase causing the performance degradation.

May be you need to re-think about your index choices.

Regards,

Re: Tuning the query

This SQL should give you idea;

SELECT HASHAMP(HASHBUCKET(HASHROW(<<NUPI Columns))) AS "AMP #",COUNT(*)
FROM <<table>>
GROUP BY 1
ORDER BY 2 DESC;

Cheers!