Query performance tuning

Database

Query performance tuning

Hey guys....am trying to tune a query that takes about 13 min to finish....the table is about 70mil and the query does some consolidation....the explain plan is below...any thoughts?

 

Explain CREATE MULTISET VOLATILE TABLE M_CONSOLIDATED_ID AS (
SELECT MIN(ID) OVER(PARTITION BY EMAIL ORDER BY ID) AS CONS_ID,ID, USER, DEVICE,IP,EMAIL,FIRST_NAME,LAST_NAME,INS_TIMESTAMP FROM W_KEYS   WHERE EMAIL IS NOT NULL AND  EMAIL<>''
UNION ALL
SELECT MIN(ID) OVER(PARTITION BY USER ORDER BY ID) AS CONS_ID,ID, USER, DEVICE,IP,EMAIL,FIRST_NAME,LAST_NAME,INS_TIMESTAMP FROM W_KEYS   WHERE (EMAIL IS NULL OR EMAIL='' ) AND USER IS NOT NULL
UNION ALL
SELECT MIN(ID) OVER(PARTITION BY DEVICE ORDER BY ID) AS CONS_ID,ID, USER, DEVICE,IP,EMAIL,FIRST_NAME,LAST_NAME,INS_TIMESTAMP FROM W_KEYS   WHERE (EMAIL IS NULL OR EMAIL='' ) AND USER IS  NULL AND DEVICE IS NOT NULL AND DEVICE NOT IN ('00000000-0000' ,'87544')
UNION ALL
SELECT MIN(ID) OVER(PARTITION BY IP ORDER BY ID) AS CONS_ID,ID, USER, DEVICE,IP,EMAIL,FIRST_NAME,LAST_NAME,INS_TIMESTAMP FROM W_KEYS   WHERE (EMAIL IS NULL OR EMAIL='' ) AND USER IS  NULL AND (DEVICE IS  NULL OR DEVICE  IN ('00000000-0000' ,'87544') AND IP IS NOT NULL ))
WITH DATA ON COMMIT PRESERVE ROWS;

  1) First, we lock W_KEYS for read on a reserved
     RowHash to prevent global deadlock.
  2) Next, we lock W_KEYS for read.
  3) We create the table header.
  4) We do an all-AMPs RETRIEVE step from W_KEYS by way
     of an all-rows scan with a condition of (
     "W_KEYS.EMAIL <> ''") into Spool 2
     (all_amps), which is built locally on the AMPs.  The size of Spool
     2 is estimated with low confidence to be 3,881,109 rows (
     1,649,471,325 bytes).  The estimated time for this step is 18.39
     seconds.
  5) We do an all-AMPs STAT FUNCTION step from Spool 2 (Last Use) by
     way of an all-rows scan into Spool 5 (Last Use), which is
     redistributed by hash code to all AMPs.  The result rows are put
     into Spool 1 (all_amps), which is built locally on the AMPs.  The
     size is estimated with low confidence to be 3,881,109 rows (
     6,291,277,689 bytes).
  6) We do an all-AMPs RETRIEVE step from W_KEYS by way
     of an all-rows scan with a condition of ("(NOT
     (W_KEYS.USER IS NULL )) AND
     ((W_KEYS.EMAIL IS NULL) OR
     (W_KEYS.EMAIL IN ('')))") into Spool 8
     (all_amps), which is built locally on the AMPs.  The size of Spool
     8 is estimated with low confidence to be 3,484,978 rows (
     1,481,115,650 bytes).  The estimated time for this step is 14.21
     seconds.
  7) We do an all-AMPs STAT FUNCTION step from Spool 8 (Last Use) by
     way of an all-rows scan into Spool 11 (Last Use), which is
     redistributed by hash code to all AMPs.  The result rows are put
     into Spool 1 (all_amps), which is built locally on the AMPs.  The
     size is estimated with low confidence to be 7,366,087 rows (
     11,940,427,027 bytes).
  8) We do an all-AMPs RETRIEVE step from W_KEYS by way
     of an all-rows scan with a condition of (
     "(W_KEYS.USER IS NULL) AND
     ((W_KEYS.DEVICE <>
     '00000000-0000') AND
     ((W_KEYS.DEVICE <> '87544') AND
     ((W_KEYS.EMAIL IS NULL) OR
     (W_KEYS.EMAIL IN ('')))))") into Spool 14
     (all_amps), which is built locally on the AMPs.  The size of Spool
     14 is estimated with low confidence to be 24,633,997 rows (
     10,469,448,725 bytes).  The estimated time for this step is 38.44
     seconds.
  9) We do an all-AMPs STAT FUNCTION step from Spool 14 (Last Use) by
     way of an all-rows scan into Spool 17 (Last Use), which is
     redistributed by hash code to all AMPs.  The result rows are put
     into Spool 1 (all_amps), which is built locally on the AMPs.  The
     size is estimated with low confidence to be 32,000,084 rows (
     51,872,136,164 bytes).
 10) We do an all-AMPs RETRIEVE step from W_KEYS by way
     of an all-rows scan with a condition of (
     "(W_KEYS.USER IS NULL) AND
     (((W_KEYS.DEVICE IS NULL) OR ((NOT
     (W_KEYS.IP IS NULL )) AND
     (W_KEYS.DEVICE IN
     ('87544',
'00000000-0000')))) AND
     ((W_KEYS.EMAIL IS NULL) OR
     (W_KEYS.EMAIL IN (''))))") into Spool 20
     (all_amps), which is built locally on the AMPs.  The size of Spool
     20 is estimated with low confidence to be 32,927,994 rows (
     13,994,397,450 bytes).  The estimated time for this step is 43.32
     seconds.
 11) We do an all-AMPs STAT FUNCTION step from Spool 20 (Last Use) by
     way of an all-rows scan into Spool 23 (Last Use), which is
     redistributed by hash code to all AMPs.  The result rows are put
     into Spool 1 (all_amps), which is built locally on the AMPs.  The
     size is estimated with low confidence to be 64,928,078 rows (
     105,248,414,438 bytes).
 12) We do an all-AMPs RETRIEVE step from Spool 1 (Last Use) by way of
     an all-rows scan into Spool 26 (all_amps), which is redistributed
     by the hash code of (CONS_ID) to all AMPs.  Then we do a SORT
     to order Spool 26 by row hash.  The size of Spool 26 is estimated
     with low confidence to be 64,928,078 rows (27,984,001,618 bytes).
     The estimated time for this step is 10 minutes and 47 seconds.
 13) We do an all-AMPs MERGE into usr1.M_CONSOLIDATED_ID from
     Spool 26 (Last Use).  The size is estimated with low confidence to
     be 64,928,078 rows.  The estimated time for this step is 9 minutes
     and 18 seconds.
 14) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> No rows are returned to the user as the result of statement 1.

1 REPLY
N/A

Re: Query performance tuning

it would be helpfull if you would provide the ddl and some example data as well ;-)

 

won't you be able to avoid the many union alls by a case statement?

 

CREATE MULTISET VOLATILE TABLE M_CONSOLIDATED_ID AS (
SELECT MIN(ID) OVER
               (PARTITION BY 
                  case 
                        when EMAIL IS NOT NULL AND  EMAIL<>'' then email
                        when USER IS NOT NULL then user
                        when DEVICE IS NOT NULL AND DEVICE NOT IN ('00000000-0000' ,'87544') then device
                        when IP IS NOT NULL 
               ) AS CONS_ID,
       ID, 
       USER, 
       DEVICE,
       IP,
       EMAIL,
       FIRST_NAME,
       LAST_NAME,
       INS_TIMESTAMP 
       FROM W_KEYS   
       WHERE (EMAIL IS NOT NULL AND  EMAIL<>'')
             or 
             USER IS NOT NULL
             OR 
             DEVICE IS NOT NULL AND DEVICE NOT IN ('00000000-0000' ,'87544')
             OR 
             IP IS NOT NULL
        )
WITH DATA ON COMMIT PRESERVE ROWS;