how to optimize query to avoid 'no spool space error'

Database
Enthusiast

Re: how to optimize query to avoid 'no spool space error'

@ToddAWalter @GJColeman Hi, even though I mark this question as solved, I got a follow up question today and I thought it'd be easier to ask here rather than open a new topic.


I tried GJColeman's suggestion to apply row_number() function to a smaller set of columns (the columns that are needed in 'partition by' clause), the spool space problem still occurs, I guess it's bcz the data set is too large (about 100,000,000 rows). And the reason I need to use row_number() is to add an index which will be used in the next step:

 

CREATE VOLATILE MULTISET TABLE SRP AS (
SEL A.RNUM,
QT,
TRMT_VRSN_ID,
GUID,
SESSION_START_DT,
SESSION_SKEY,
EVENT_TIMESTAMP,
SEQ,
PAGE_TYPE,
KEYWORD,
A.ITEM_LIST,
N_ITEM,
CIID AS JOIN_ID,
B.IDX,
CAST(B.ITEM_ID AS DECIMAL(18, 0)) ITEM_ID
FROM SRP_VI A
INNER JOIN (
SEL *
FROM
TABLE (STRTOK_SPLIT_TO_TABLE(SRP_VI.RNUM, SRP_VI.ITEM_LIST, ',') 
RETURNS (RNUM INT,
IDX INT,
ITEM_ID VARCHAR(50)
)) AS DT
) B
ON A.RNUM = B.RNUM
WHERE A.PAGE_TYPE = 'SRP'
) WITH DATA PRIMARY INDEX (QT, GUID, SESSION_START_DT, SESSION_SKEY) ON COMMIT PRESERVE ROWS;

 

Since the data size cannot be reduced, is there any better way to achieve the same goal of the above piece of code? Or to contact adm to request a large spool space is the only solution?

 

Thanks a lot!

Senior Apprentice

Re: how to optimize query to avoid 'no spool space error'

Do you want to create the row number for a following join to STRTOK_SPLI_TO_TABLE because this function only allows a single column for inKey?

I found a workaround for REGEXP_SPLIT_TO_TABLE: concatenate multiple columns into 1 and split it afterwards:

WITH cte (inKey, ExplainText) AS
 (
   SELECT
       /* RegExp_Split_To_Table allows only a single column int or VarChar as inKey
         -> combine QueryID, ExpRowNo & LogDate into a single column */
      Cast(   (Trim(QueryID (Format '9(18)')))
           || (Trim(ExpRowNo (Format '9(5)')))
           || (Trim(LogDate (Format 'yyyymmdd')))
           AS VARCHAR(128)) AS inKey,
  ...
 ) 
       SELECT
          /* Split outKey into three seperate columns again */
          Cast(Substring(outKey From  1 FOR 18) AS DECIMAL(18,0)) AS QueryID,
          Cast(Substring(outKey From 19 FOR  5) AS SMALLINT) AS ExpRowNo,
          Cast(Substring(outKey From 24 FOR  8) AS DATE Format 'yyyymmdd') AS LogDate,

...
       FROM TABLE
         ( RegExp_Split_To_Table
             ( cte.inKey,
               cte.ExplainText,
               '(\n((?=  1\) First,)|(?=  2\) Next,)|(?= {0,2}[1-9][0-9]{0,4}\) (We|Finally))|(?= {5,7}[1-9][0-9]{0,4}\) We)|(?=  -> )))','mi'
             ) RETURNS
                ( outKey VARCHAR(62) CHARACTER SET Unicode,
                  TokenNum INT,
                  Token VARCHAR(31000) CHARACTER SET Unicode
                )

Additionally, can you show the Explain of your query?

 

 

Highlighted
Teradata Employee

Re: how to optimize query to avoid 'no spool space error'

@dnoethmakes some good points, as usual.  Also note that a volatile table occupies spool space for as long as the session is active.

Enthusiast

Re: how to optimize query to avoid 'no spool space error'

Hi @dnoeth

Thanks for the reply. The first part of my code is here, the reason I want to use row_number() here is to add an index for each row in the step-2 use of strtok_spli_to_table.

CREATE VOLATILE MULTISET TABLE SRP_VI AS (
SEL DISTINCT EP.QT,
EP.TRMT_VRSN_ID,
-- EP.CHNL_ID,
CAST(E.GUID AS VARCHAR(32)) GUID,
E.SESSION_START_DT,
E.SESSION_SKEY,
E.EVENT_TIMESTAMP,
E.SEQNUM,
E.PAGE_ID,
E.PAGE_NAME,
CAST(LOWER(TRIM(E.SQR)) AS VARCHAR(100)) AS KEYWORD,
CAST(CASE WHEN E.PAGE_ID IN (2045573, 3286, 2047936, 1468757, 2051457, 1677950, 2046791, 2053742, 2054032, 5387, 1637, 4999, 2059706, 2052261, 2351460) THEN 'SRP'
WHEN E.PAGE_ID IN (1881, 4340, 4079, 4979, 1673582, 2047465, 2047675, 2017935, 2054897, 2059707, 2060182) THEN 'VI' END
AS VARCHAR(10)) PAGE_TYPE,
CASE WHEN PAGE_TYPE = 'SRP' THEN LENGTH(KEYWORD)
WHEN PAGE_TYPE = 'VI' THEN -1 END QUERY_LENGTH,
CASE WHEN PAGE_TYPE = 'SRP' AND (SOJLIB.SOJ_EXTRACT_FLAG(E.FLAGS, 070) = 1 OR SOJLIB.SOJ_EXTRACT_FLAG(E.FLAGS, 236) = 1 OR SOJLIB.SOJ_EXTRACT_FLAG(E.FLAGS, 237) = 1) THEN 'BEST MATCH'
WHEN PAGE_TYPE = 'VI' THEN 'NOT_APP' END SORT_TYPE,
CASE WHEN PAGE_TYPE = 'SRP' AND (SOJLIB.SOJ_EXTRACT_FLAG(E.FLAGS,52) = 1 OR CAST(SOJLIB.SOJ_NVL(E.SOJ, 'cpnip') AS INTEGER) = 1) THEN 1
WHEN PAGE_TYPE = 'VI' THEN 0 END PAGINITION,
CASE WHEN PAGE_TYPE = 'SRP' AND SOJLIB.SOJ_NVL(E.SOJ, 'gf') IS NULL AND SOJLIB.SOJ_NVL(E.SOJ, 'aa') IS NULL THEN 'NO_FILTER'
WHEN PAGE_TYPE = 'VI' THEN 'NOT_APP' END CONST_APPLIED,
CASE WHEN PAGE_TYPE = 'SRP' THEN CAST(SOJLIB.SOJ_DECODE_BASE36_VEC(SOJLIB.SOJ_NVL(E.SOJ, 'itm')) AS VARCHAR(1000)) END ITEM_LIST,
CAST(CHAR_LENGTH(ITEM_LIST) - CHAR_LENGTH(REGEXP_REPLACE(ITEM_LIST, '[,]', '', 1, 0, 'i')) + 1 AS BYTEINT) N_ITEM,
CAST(E.ITEM_ID AS DECIMAL(18, 0)) ITEM_ID,
CAST(SOJLIB.SOJ_NVL(E.SOJ, 'siid') AS VARCHAR(32)) SIID,
CAST(SOJLIB.SOJ_NVL(E.SOJ, 'ciid') AS VARCHAR(32)) CIID,
CAST(SOJLIB.SOJ_NVL(E.SOJ, 'ec') AS BYTEINT) EP_CHNL,
ROW_NUMBER() OVER (ORDER BY QT, TRMT_VRSN_ID, E.GUID, E.SESSION_SKEY, E.SESSION_START_DT, E.EVENT_TIMESTAMP, E.SEQNUM) AS RNUM -- PUT IT IN NEXT STEP TO AVOID NO SPOOL SPACE PROBLEM
FROM UBI_V.UBI_EVENT E
INNER JOIN P_SOJ_CL_V.CLAV_SESSION S
ON E.GUID = S.GUID
AND E.SESSION_SKEY = S.SESSION_SKEY
AND E.SESSION_START_DT = S.SESSION_START_DT
AND E.SITE_ID = S.SITE_ID
INNER JOIN ACCESS_VIEWS.EXL_GUID_QT_SESS EP
ON EP.SESSION_START_DT = E.SESSION_START_DT
AND EP.GUID = E.GUID
AND EP.SESSION_SKEY = E.SESSION_SKEY
AND EP.SITE_ID = E.SITE_ID
WHERE E.SESSION_START_DT BETWEEN '2017-03-26' AND '2017-04-01'
AND (E.PAGE_ID IN (2045573, 3286, 2047936, 1468757, 2051457, 1677950, 2046791, 2053742, 2054032, 5387, 1637, 4999, 2059706, 2052261, 1881, 4340, 4079, 4979, 1673582, 2047465, 2047675, 2017935, 2054897, 2059707, 2060182)
OR (PAGE_ID = 2351460 AND (SOJLIB.SOJ_NVL(SOJ, 'eactn')) = 'expc')) -- ADD MWEB
AND E.RDT = 0 -- EXCLUDE REDIRECS
AND E.SITE_ID = 0
--AND SOJLIB.SOJ_NVL(E.SOJ,'**bleep**') > 50 -- RECALL SIZE > 50
AND (S.COBRAND IN (0) OR (COBRAND = 6 AND S.PRIMARY_APP_ID IN (1115, 3564)))
AND S.EXCLUDE = 0 --Excludes single-page sessions for all but Mobile app sessions or UID-mapped sessions, and excludes bots from all sessions
AND S.VALID_PAGE_COUNT > 1 -- EXCLUDE SINGLE PAGE SESSIONS
AND EP.CHNL_ID IN (1, 2)
AND EP.QT IN /*(42860, 42861, 42862, */(42863, 42864)
AND EP.TRMT_VRSN_ID = 2
AND (QUERY_LENGTH BETWEEN 2 AND 100 OR QUERY_LENGTH = -1)
AND SORT_TYPE IN ('BEST MATCH', 'NOT_APP')
AND PAGINITION IN (1, 0)
AND CONST_APPLIED IN ('NO_FILTER', 'NOT_APP')
--GROUP BY 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22
) WITH DATA PRIMARY INDEX (QT, GUID, SESSION_SKEY, SESSION_START_DT) ON COMMIT PRESERVE ROWS;

 

I have two questions regarding your code:

1. what's the difference between 'STRTOK_SPLI_TO_TABLE' and 'REGEXP_SPLIT_TO_TABLE'? I assume both of them only allows single column for inKey? 

2. does putting 'distinct' after 'select' and putting 'group by' all the columns have the same effect in deleting duplicate rows when creating a table?

3. I didn't find any elaborate content on teradata numeric format, for example 'Format '9(18)', I am not sure what it means exactly, could you recommend me some manuals to look for? (I already seached the manual 'Data_Types_and_Literal' but didn't get any lucky.

 

Thanks!