Joining varchar to integer

Database
N/A

Joining varchar to integer

Hi all,

I am trying to tune a query that quickly spools out during a join on two tables between varchar and integer value.

This is the query I am running:

SELECT top 10 

A.*

FROM DB_1.Event_ATM_Card A

LEFT JOIN DB_2.BKEY_CARD  B

ON A.Card_Id  = B.EDW_KEY

and B.DOMAIN_ID =1

A.Card_Id is the varchar value and B.EDW_KEY is the integer value. B.Domain_ID is smallint.

Stats have been collected on all columns. I noticed that while performing join, the optimizer converts the varchar value to float and then joins on condition. Below is the Explain Plan step for the join:

Next, we execute the following steps in parallel. 

       1) We do an all-AMPs RETRIEVE step from DB_1.A by way of an

          all-rows scan with no residual conditions into Spool 3

          (all_amps) (compressed columns allowed), which is built

          locally on the AMPs.  Then we do a SORT to order Spool 3 by

          the hash code of (DB_1.A.Card_Id (FLOAT, FORMAT

          '-9.99999999999999E-999')(FLOAT)).  The result spool file

          will not be cached in memory.  The size of Spool 3 is

          estimated with high confidence to be 682,928,659 rows (

          535,416,068,656 bytes).  The estimated time for this step is

          24 minutes and 3 seconds. 

       2) We do an all-AMPs RETRIEVE step from DP_UTL.BKEY_CARD in view

          DB_2.BKEY_CARD by way of an all-rows scan with a

          condition of ("DP_UTL.BKEY_CARD in view

          DB_2.BKEY_CARD.Domain_Id = 1") into Spool 4 (all_amps)

          (compressed columns allowed), which is duplicated on all AMPs. 

          Then we do a SORT to order Spool 4 by the hash code of (

          DB_2.BKEY_CARD in view DB_2.BKEY_CARD.EDW_Key (FLOAT)). 

          The size of Spool 4 is estimated with high confidence to be

          6,987,527,424 rows (125,775,493,632 bytes).  The estimated

          time for this step is 5 minutes and 59 seconds. 

What confuses me, is that the Explain plan is showing high confidence while retrieval but it is here that the query spools out. My user has a spool space of 200 GB. SELECT * works fine individually but it is the join that causes the spool out error. The combined spool is 535 GB + 125 GB. Maybe I need to increase my spool space here?

To further my checking, I then ran the following query just to get an idea of the data being joined:

sel top 10 a.card_id, b.edw_key from DB_1.Event_ATM_Card A left join DB_2.BKEY_CARD b on a.card_id = b.edw_key

and I was shown the error [2621] Bad character in format or data of Event_Atm_Card.Card_Id...

The PI for DB_1.Event_Atm_Card is ( Event_ID ,Card_Id ) with additional Partitioned Indexes and Secondary Indexes defined

The PI for DB_2.BKEY_CARD is  ( Source_Key ,Domain_Id ).

Stats have been collected on all indexes as well as individually on the mentioned columns.

The above query is from a view which has a series of such joins and I am hoping that if this join is solved the others will probably have the same issue..!!

Thanks.

6 REPLIES
N/A

Re: Joining varchar to integer

Bump...

N/A

Re: Joining varchar to integer

That's the penalty for a bad datamodel :-)

What's the (P)PI of both tables?

If you got bad data (and you can't clean it) you can try two things:

  • ON CAST(A.Card_Id AS NUMBER) = B.EDW_KEY (which will return NULL instead of an error) 
  • ON A.Card_Id = TRIM(B.EDW_KEY) (which might fail if the format doesn't match)

Btw, the query you showed is useless (you don't access any column from the B-table and it's an outer join) and the TOP 10 will not be faster as it's processed after the join. You must move the TOP into a Derived Table:

SELECT 
A.*
FROM
( SELECT top 10 *
FROM DB_1.Event_ATM_Card
) AS A
LEFT JOIN DB_2.BKEY_CARD B
ON A.Card_Id = B.EDW_KEY
and B.DOMAIN_ID =1
N/A

Re: Joining varchar to integer

Hi Dieter,

Will definitely try the things suggested. Can't do much about the data model though... :(

The actual query does not have the "top 10". I just added it to check if the processing was faster. It wasn't. 

The second condition ON A.Card_Id = TRIM(B.EDW_KEY)  is the one which actually shows the results (the first gives an error). Now what happens is the result for 10 values is returned after a minute IF I use a derived table. If I remove this condition (like the actual query I have), the query spools out again.

I have PPI defined on the table Table A and not on Table B with up-to-date stats:

PRIMARY INDEX ( Event_ID ,Card_Id )

PARTITION BY RANGE_N(Start_Date  BETWEEN DATE '2009-06-01' AND DATE '2100-12-31' EACH INTERVAL '1' MONTH ,NO RANGE)

INDEX ( Acct_Type_Cd_1 )

INDEX ( Acct_Type_Cd_2 )

INDEX idx_ATC1_ATC2 ( Acct_Type_Cd_1 ,Acct_Type_Cd_2 )

INDEX idx_ANBR1_ANBR2 ( Acct_Nbr_1 ,Acct_Nbr_2 )

INDEX idx_ANBR_ATC ( Acct_Type_Cd_1 ,Acct_Nbr_1 ,Acct_Type_Cd_2 ,Acct_Nbr_2 )

Any hints, suggestions would be welcome...

N/A

Re: Joining varchar to integer

What about if you try to create a new table, could be a volatile one with the needed information of 

DB_2.BKEY_CARD 

and then join with the one you have in FROM.

Regards,

Teradata Employee

Re: Joining varchar to integer

You can try to cast the integer to CHAR. But you have to know the format of A.Card_Id in order to cast the integer in a way which will compare properly. It sounds as if A.Card_Id has some values that are non-numeric so doing a conversion to numeric to do the comparison is going to be harder. You would have to craft a condition that would exclude the non-numeric ones first, then cast the rest to allow the compare.

This is going to take a lot of spool in any case. As you noted in the explain, spooling A requires at least 536GB of spool. And since it is so large relative to B, the spool for B is being duplicated making it require at least 126GB. When those two are joined, there has to be room for a spool of the join result in addition to the two spooled tables. Since all rows are being selected from A, the spool is going to be at least as large as the 536GB of the A spool plus whatever columns you intend to add from B. Looking like 1.5TB at least of spool to execute the query.

As noted by Dieter, without any columns from B in the select list, this will be an inner join rather than an outer.

N/A

Re: Joining varchar to integer

Hi all, sorry for being away.

I changed the structure of the smaller table by modifying the PI as well as changing the joining PI columns to VARCHAR. Hence the condition became a simple PI - PI join with similar data types. 

Result: query running 55 minutes was reduced to bringing result in under a minute....