Reduce CPU skew for an Update Query

Database
The Teradata Database channel includes discussions around advanced Teradata features such as high-performance parallel database technology, the optimizer, mixed workload management solutions, and other related technologies.

Reduce CPU skew for an Update Query

Format:HTML Format Version:1.0 StartHTML: 165 EndHTML: 8080 StartFragment: 314 EndFragment: 8048 StartSelection: 314 EndSelection: 314SyntaxEditor Code Snippet

UPDATE  ABC  FROM ttmp.BI_CUST_PNL  ABC,
  (SELECT ABC1.Master_Cust_Nbr, ABC1.oper_unit_country_code ,mdm.HOSPITALITY_MASTER_PARENT_ID,
  mdm.HOSPITALITY_MASTER_PARENT_NAME, mdm.CREATION_DATE_TIME
    from ttmp.BI_CUST_PNL ABC1  
    LEFT JOIN ced7.DIM_CUSTOMER_MDM mdm
    ON ABC1.oper_unit_country_code=mdm.BUS_OPER_UNIT_COUNTRY_CODE
    AND  ABC1.Master_Cust_Nbr =mdm.customer_nbr
GROUP by 1,2,3,4,5
  )  ABC3
  SET HOSPITALITY_MASTER_PARENT_ID =  CASE WHEN  ABC3.HOSPITALITY_MASTER_PARENT_ID='0' THEN NULL ELSE ABC3.HOSPITALITY_MASTER_PARENT_ID END ,
      HOSPITALITY_MASTER_PARENT_NAME = CASE WHEN  ABC3.HOSPITALITY_MASTER_PARENT_ID='0' THEN NULL ELSE ABC3.HOSPITALITY_MASTER_PARENT_NAME END,
      Cust_MDM_CREATION_DATE_TIME= ABC3.CREATION_DATE_TIME
WHERE  ABC.oper_unit_country_code=ABC3.oper_unit_country_code
AND    ABC.Master_Cust_Nbr =ABC3.Master_Cust_Nbr;

The table ttmp.BI_CUST_PNL is having NUPI AS: PRIMARY INDEX ( Accounting_Yr_Month ,oper_unit_country_code , Master_Cust_Nbr ,Product_Id ,Facct ,FML_Org_Cd ,Func_Group_Id );

The table DIM_CUSTOMER_MDM is having PI and SI AS:
UNIQUE PRIMARY INDEX ( BUS_OPER_UNIT_COUNTRY_CODE ,CUSTOMER_NBR )
INDEX ( ENTERPRISE_NBR )
INDEX ( ENTERPRISE_NAME ,ENTERPRISE_NBR );

The Explain Plan is as follows:

1) First, we lock ced7.mdm for read on a reserved RowHash to prevent

global deadlock.

 2) Next, we lock ttmp.BI_CUST_PNL for write on a reserved

 RowHash to prevent global deadlock.

 3) We lock tedw.HASH_CUSTOMER_PNL for read on a reserved RowHash to

prevent global deadlock.

 4) We lock ced7.mdm for read, we lock ttmp.BI_CUST_PNL for

 write, and we lock tedw.HASH_CUSTOMER_PNL for read.

5) We execute the following steps in parallel.

 1) We do an all-AMPs RETRIEVE step from ced7.mdm by way of an

 all-rows scan with no residual conditions into Spool 4

 (all_amps), which is built locally on the AMPs. The size of

 Spool 4 is estimated with high confidence to be 593,494 rows

(98,520,004 bytes). The estimated time for this step is 0.39

seconds.

 2) We do an all-AMPs RETRIEVE step from tedw.HASH_CUSTOMER_PNL

 by way of an all-rows scan with no residual conditions into

 Spool 5 (all_amps), which is redistributed by the hash code

 of (tedw.HASH_CUSTOMER_PNL.Master_Cust_Nbr,

 tedw.HASH_CUSTOMER_PNL.oper_unit_country_code) to all AMPs.

 The size of Spool 5 is estimated with high confidence to be

2,510,043 rows (67,771,161 bytes). The estimated time for

this step is 0.75 seconds.

 6) We do an all-AMPs JOIN step from Spool 4 (Last Use) by way of an

 all-rows scan, which is joined to Spool 5 (Last Use) by way of an

 all-rows scan. Spool 4 and Spool 5 are right outer joined using a

 single partition hash join, with condition(s) used for

 non-matching on right table ("(NOT (oper_unit_country_code IS NULL

 )) AND (NOT (Master_Cust_Nbr IS NULL ))"), with a join condition

 of ("(Master_Cust_Nbr = CUSTOMER_NBR) AND (oper_unit_country_code

 = BUS_OPER_UNIT_COUNTRY_CODE)"). The result goes into Spool 3

 (all_amps), which is built locally on the AMPs. The size of Spool

3 is estimated with low confidence to be 2,510,043 rows (

421,687,224 bytes). The estimated time for this step is 0.46

seconds.

 7) We do an all-AMPs SUM step to aggregate from Spool 3 (Last Use) by

 way of an all-rows scan , grouping by field1 (

 tedw.HASH_CUSTOMER_PNL.Master_Cust_Nbr

 ,tedw.HASH_CUSTOMER_PNL.oper_unit_country_code

 ,ced7.mdm.HOSPITALITY_MASTER_PARENT_ID

 ,ced7.mdm.HOSPITALITY_MASTER_PARENT_NAME

 ,ced7.mdm.CREATION_DATE_TIME). Aggregate Intermediate Results are

 computed globally, then placed in Spool 8. The size of Spool 8 is

estimated with low confidence to be 595,932 rows (352,791,744

bytes). The estimated time for this step is 7.33 seconds.

 8) We do an all-AMPs RETRIEVE step from Spool 8 (Last Use) by way of

 an all-rows scan into Spool 1 (used to materialize view, derived

table, table function or table operator EDW3) (all_amps), which is

 built locally on the AMPs. The size of Spool 1 is estimated with

low confidence to be 595,932 rows (103,692,168 bytes). The

estimated time for this step is 0.14 seconds.

9) We execute the following steps in parallel.

 1) We do an all-AMPs RETRIEVE step from Spool 1 (Last Use) by

 way of an all-rows scan into Spool 11 (all_amps) fanned out

 into 2 hash join partitions, which is redistributed by the

 hash code of (tedw.HASH_CUSTOMER_PNL.Master_Cust_Nbr,

 tedw.HASH_CUSTOMER_PNL.oper_unit_country_code) to all AMPs.

 The size of Spool 11 is estimated with low confidence to be

595,932 rows (98,924,712 bytes). The estimated time for this

step is 0.22 seconds.

 2) We do an all-AMPs JOIN step from ttmp.BI_CUST_PNL

 by way of a RowHash match scan with no residual conditions,

 which is joined to tedw.HASH_CUSTOMER_PNL by way of a RowHash

 match scan with no residual conditions.

 ttmp.BI_CUST_PNL and tedw.HASH_CUSTOMER_PNL are

 joined using a merge join, with a join condition of (

 "(tedw.HASH_CUSTOMER_PNL.Field_1032 =

 ttmp.BI_CUST_PNL.Func_Group_Id) AND

 ((tedw.HASH_CUSTOMER_PNL.Field_1031 =

 ttmp.BI_CUST_PNL.FML_Org_Cd) AND

 ((tedw.HASH_CUSTOMER_PNL.Field_1030 =

 ttmp.BI_CUST_PNL.Facct) AND

 ((tedw.HASH_CUSTOMER_PNL.Field_1029 =

 ttmp.BI_CUST_PNL.Product_Id) AND

 ((tedw.HASH_CUSTOMER_PNL.Field_1026 =

 ttmp.BI_CUST_PNL.Master_Cust_Nbr) AND

 ((tedw.HASH_CUSTOMER_PNL.Field_1025 =

 ttmp.BI_CUST_PNL.oper_unit_country_code) AND

 ((tedw.HASH_CUSTOMER_PNL.Field_1028 =

 ttmp.BI_CUST_PNL.Accounting_Yr_Month) AND

 (tedw.HASH_CUSTOMER_PNL."UNIQUE" =

 (SUBSTRING((ttmp.BI_CUST_PNL.ROWID) FROM (7) FOR (4

 ))))))))))"). The result goes into Spool 12 (all_amps)

fanned out into 2 hash join partitions, which is

 redistributed by the hash code of (

 tedw.HASH_CUSTOMER_PNL.Master_Cust_Nbr,

 tedw.HASH_CUSTOMER_PNL.oper_unit_country_code) to all AMPs.

 The size of Spool 12 is estimated with low confidence to be

2,510,043 rows (80,321,376 bytes). The estimated time for

this step is 1.66 seconds.

 10) We do an all-AMPs JOIN step from Spool 11 (Last Use) by way of an

 all-rows scan, which is joined to Spool 12 (Last Use) by way of an

 all-rows scan. Spool 11 and Spool 12 are joined using a hash join

 of 2 partitions, with a join condition of ("(Master_Cust_Nbr =

MASTER_CUST_NBR) AND (oper_unit_country_code =

 OPER_UNIT_COUNTRY_CODE)"). The result goes into Spool 10

 (all_amps), which is redistributed by the hash code of (

 ttmp.BI_CUST_PNL.ROWID) to all AMPs. Then we do a SORT

 to order Spool 10 by the sort key in spool field1. The size of

 Spool 10 is estimated with low confidence to be 2,520,354 rows (

395,695,578 bytes). The estimated time for this step is 0.87

seconds.

 11) We do a MERGE Update to ttmp.BI_CUST_PNL from Spool 10

(Last Use) via ROWID. The size is estimated with low confidence

to be 2,520,354 rows (4,065,331,002 bytes). The estimated time

for this step is 57.98 seconds.

12) Finally, we send out an END TRANSACTION step to all AMPs involved

in processing the request.

-> No rows are returned to the user as the result of statement 1.

The total estimated time is 1 minute and 9 seconds.

CPU Skew is >10.5. It should be below 2.0 atmost. How to get down to it.

4 REPLIES

Re: Reduce CPU skew for an Update Query

The CPU Skew factor was found to be hovering between 10 and 12 while using the sqlqrylog and qrylog tables.

Anyone please advise if creating any other type of indexes on this table will help or if we can break down this query ?

Junior Supporter

Re: Reduce CPU skew for an Update Query

I will suggest to run different parts of the query. Say run the subtable creation first and see on viewpoint if the skew is more than 80-90 for a long time. If it is , then it means the join condition is skewed. Then do a grobp y on per_unit_country_code,Master_Cust_Nbr to see if there are any values/nulls etc that is causing the skew. The go to the update part and so forth. Need to analyze the data in different joins to find the reason for skew of the query.ttmp.BI_CUST_PNL seems to be a big table.

--Thx! Samir

Re: Reduce CPU skew for an Update Query

Hi user2,

The Update query is designed fine. It was that the data was skewed in the table with around 40.42%. After choosing different columns as the Primary Index for the table, the CPU skew came down to 1.37 and Skew factor to 7.8%.

 

Please tell me how to determine what are all the columns in a table that might be used frequently in joins (other than referring the logical data models) with other tables in order to determine the columns that can be used in the Primary Index along with columns that have more unique values?

Thanks

Chandra

Junior Supporter

Re: Reduce CPU skew for an Update Query

I think there is no direct way to take that out. I would suggest to take out queries that are run most number of times on that view/table and analyze them. Hope you know how to take them out by using logtable and query table.