Reduce CPU skew for an Update Query

Database
The Teradata Database channel includes discussions around advanced Teradata features such as high-performance parallel database technology, the optimizer, mixed workload management solutions, and other related technologies.
Enthusiast

Reduce CPU skew for an Update Query

Format:HTML Format Version:1.0 StartHTML: 165 EndHTML: 8080 StartFragment: 314 EndFragment: 8048 StartSelection: 314 EndSelection: 314SyntaxEditor Code Snippet

UPDATE ABC FROM pmpp.customer_profits_or_losses ABC,
(SELECT ABC1.Parent_customer_number, ABC1.customer_from_country_region ,dmm.parent_id_spitality,
dmm.cust_parent_name, dmm.CREATION_DATE_TIME
from pmpp.customer_profits_or_losses ABC1
LEFT JOIN ced7.customer_dimensiondmm dmm
ON ABC1.customer_from_country_region=dmm.customer_country_src_code
AND ABC1.Parent_customer_number =dmm.customer_number
GROUP by 1,2,3,4,5
) ABC3
SET parent_id_spitality = CASE WHEN ABC3.parent_id_spitality='0' THEN NULL ELSE ABC3.parent_id_spitality END ,
cust_parent_name = CASE WHEN ABC3.parent_id_spitality='0' THEN NULL ELSE ABC3.cust_parent_name END,
Cust_dmm_CREATION_DATE_TIME= ABC3.CREATION_DATE_TIME
WHERE ABC.customer_from_country_region=ABC3.customer_from_country_region
AND ABC.Parent_customer_number =ABC3.Parent_customer_number;

The table pmpp.customer_profits_or_losses is having NUPI AS: PRIMARY INDEX ( Acctg_Year_Month,customer_from_country_region , Parent_customer_number ,Product_Id ,Account_unique_code ,Customer_organisation_code ,Func_Group_Id );

The table customer_dimensiondmm is having PI and SI AS:
UNIQUE PRIMARY INDEX ( customer_country_src_code ,customer_number )
INDEX ( ENTERPRISE_NBR )
INDEX ( ENTERPRISE_NAME ,ENTERPRISE_NBR );

The Explain Plan is as follows:
1) First, we lock ced7.dmm for read on a reserved RowHash to prevent

global deadlock.

2) Next, we lock pmpp.customer_profits_or_losses for write on a reserved

RowHash to prevent global deadlock.

3) We lock wetd.HASH_CUSTOMER_PNL for read on a reserved RowHash to

prevent global deadlock.

4) We lock ced7.dmm for read, we lock pmpp.customer_profits_or_losses for

write, and we lock wetd.HASH_CUSTOMER_PNL for read.

5) We execute the following steps in parallel.

1) We do an all-AMPs RETRIEVE step from ced7.dmm by way of an

all-rows scan with no residual conditions into Spool 4

(all_amps), which is built locally on the AMPs. The size of

Spool 4 is estimated with high confidence to be 593,494 rows

(98,520,004 bytes). The estimated time for this step is 0.39

seconds.

2) We do an all-AMPs RETRIEVE step from wetd.HASH_CUSTOMER_PNL

by way of an all-rows scan with no residual conditions into

Spool 5 (all_amps), which is redistributed by the hash code

of (wetd.HASH_CUSTOMER_PNL.Parent_customer_number,

wetd.HASH_CUSTOMER_PNL.customer_from_country_region) to all AMPs.

The size of Spool 5 is estimated with high confidence to be

2,510,043 rows (67,771,161 bytes). The estimated time for

this step is 0.75 seconds.

6) We do an all-AMPs JOIN step from Spool 4 (Last Use) by way of an

all-rows scan, which is joined to Spool 5 (Last Use) by way of an

all-rows scan. Spool 4 and Spool 5 are right outer joined using a

single partition hash join, with condition(s) used for

non-matching on right table ("(NOT (customer_from_country_region IS NULL

)) AND (NOT (Parent_customer_number IS NULL ))"), with a join condition

of ("(Parent_customer_number = customer_number) AND (customer_from_country_region

= customer_country_src_code)"). The result goes into Spool 3

(all_amps), which is built locally on the AMPs. The size of Spool

3 is estimated with low confidence to be 2,510,043 rows (

421,687,224 bytes). The estimated time for this step is 0.46

seconds.

7) We do an all-AMPs SUM step to aggregate from Spool 3 (Last Use) by

way of an all-rows scan , grouping by field1 (

wetd.HASH_CUSTOMER_PNL.Parent_customer_number

,wetd.HASH_CUSTOMER_PNL.customer_from_country_region

,ced7.dmm.parent_id_spitality

,ced7.dmm.cust_parent_name

,ced7.dmm.CREATION_DATE_TIME). Aggregate Intermediate Results are

computed globally, then placed in Spool 8. The size of Spool 8 is

estimated with low confidence to be 595,932 rows (352,791,744

bytes). The estimated time for this step is 7.33 seconds.

8) We do an all-AMPs RETRIEVE step from Spool 8 (Last Use) by way of

an all-rows scan into Spool 1 (used to materialize view, derived

table, table function or table operator EDW3) (all_amps), which is

built locally on the AMPs. The size of Spool 1 is estimated with

low confidence to be 595,932 rows (103,692,168 bytes). The

estimated time for this step is 0.14 seconds.

9) We execute the following steps in parallel.

1) We do an all-AMPs RETRIEVE step from Spool 1 (Last Use) by

way of an all-rows scan into Spool 11 (all_amps) fanned out

into 2 hash join partitions, which is redistributed by the

hash code of (wetd.HASH_CUSTOMER_PNL.Parent_customer_number,

wetd.HASH_CUSTOMER_PNL.customer_from_country_region) to all AMPs.

The size of Spool 11 is estimated with low confidence to be

595,932 rows (98,924,712 bytes). The estimated time for this

step is 0.22 seconds.

2) We do an all-AMPs JOIN step from pmpp.customer_profits_or_losses

by way of a RowHash match scan with no residual conditions,

which is joined to wetd.HASH_CUSTOMER_PNL by way of a RowHash

match scan with no residual conditions.

pmpp.customer_profits_or_losses and wetd.HASH_CUSTOMER_PNL are

joined using a merge join, with a join condition of (

"(wetd.HASH_CUSTOMER_PNL.Field_1032 =

pmpp.customer_profits_or_losses.Func_Group_Id) AND

((wetd.HASH_CUSTOMER_PNL.Field_1031 =

pmpp.customer_profits_or_losses.Customer_organisation_code) AND

((wetd.HASH_CUSTOMER_PNL.Field_1030 =

pmpp.customer_profits_or_losses.Account_unique_code) AND

((wetd.HASH_CUSTOMER_PNL.Field_1029 =

pmpp.customer_profits_or_losses.Product_Id) AND

((wetd.HASH_CUSTOMER_PNL.Field_1026 =

pmpp.customer_profits_or_losses.Parent_customer_number) AND

((wetd.HASH_CUSTOMER_PNL.Field_1025 =

pmpp.customer_profits_or_losses.customer_from_country_region) AND

((wetd.HASH_CUSTOMER_PNL.Field_1028 =

pmpp.customer_profits_or_losses.Acctg_Yr_Mth) AND

(wetd.HASH_CUSTOMER_PNL."UNIQUE" =

(SUBSTRING((pmpp.customer_profits_or_losses.ROWID) FROM (7) FOR (4

))))))))))"). The result goes into Spool 12 (all_amps)

fanned out into 2 hash join partitions, which is

redistributed by the hash code of (

wetd.HASH_CUSTOMER_PNL.Parent_customer_number,

wetd.HASH_CUSTOMER_PNL.customer_from_country_region) to all AMPs.

The size of Spool 12 is estimated with low confidence to be

2,510,043 rows (80,321,376 bytes). The estimated time for

this step is 1.66 seconds.

10) We do an all-AMPs JOIN step from Spool 11 (Last Use) by way of an

all-rows scan, which is joined to Spool 12 (Last Use) by way of an

all-rows scan. Spool 11 and Spool 12 are joined using a hash join

of 2 partitions, with a join condition of ("(Parent_customer_number =

Parent_customer_number) AND (customer_from_country_region =

customer_from_country_region)"). The result goes into Spool 10

(all_amps), which is redistributed by the hash code of (

pmpp.customer_profits_or_losses.ROWID) to all AMPs. Then we do a SORT

to order Spool 10 by the sort key in spool field1. The size of

Spool 10 is estimated with low confidence to be 2,520,354 rows (

395,695,578 bytes). The estimated time for this step is 0.87

seconds.

11) We do a MERGE Update to pmpp.customer_profits_or_losses from Spool 10

(Last Use) via ROWID. The size is estimated with low confidence

to be 2,520,354 rows (4,065,331,002 bytes). The estimated time

for this step is 57.98 seconds.

12) Finally, we send out an END TRANSACTION step to all AMPs involved

in processing the request.

-> No rows are returned to the user as the result of statement 1.

The total estimated time is 1 minute and 9 seconds.

CPU Skew is >10.5. It should be below 2.0 atmost. How to get down to it.

4 REPLIES
Enthusiast

Re: Reduce CPU skew for an Update Query

The CPU Skew factor was found to be hovering between 10 and 12 while using the sqlqrylog and qrylog tables.

Anyone please advise if creating any other type of indexes on this table will help or if we can break down this query ?

Junior Supporter

Re: Reduce CPU skew for an Update Query

I will suggest to run different parts of the query. Say run the subtable creation first and see on viewpoint if the skew is more than 80-90 for a long time. If it is , then it means the join condition is skewed. Then do a grobp y on per_unit_country_code,Master_Cust_Nbr to see if there are any values/nulls etc that is causing the skew. The go to the update part and so forth. Need to analyze the data in different joins to find the reason for skew of the query.ttmp.BI_CUST_PNL seems to be a big table.

--Thx! Samir

Enthusiast

Re: Reduce CPU skew for an Update Query

Hi user2,

The Update query is designed fine. It was that the data was skewed in the table with around 40.42%. After choosing different columns as the Primary Index for the table, the CPU skew came down to 1.37 and Skew factor to 7.8%.

 

Please tell me how to determine what are all the columns in a table that might be used frequently in joins (other than referring the logical data models) with other tables in order to determine the columns that can be used in the Primary Index along with columns that have more unique values?

Thanks

Chandra

Junior Supporter

Re: Reduce CPU skew for an Update Query

I think there is no direct way to take that out. I would suggest to take out queries that are run most number of times on that view/table and analyze them. Hope you know how to take them out by using logtable and query table.