Apparent bug in query plan

Database
Enthusiast

Apparent bug in query plan

I have a query that appears to encounter a bug in the execution plan, causing incorrect results.

 

Here is the query:

 

SyntaxEditor Code Snippet

SELECT * FROM  
    (
            SELECT
            rev.AmountType           ,rev.CurrencyID           ,rev.PERIOD           ,rev.ServiceTierID           ,rev.CountryID           ,rev.BillingRate           ,rev.BillingPartnerID           ,rev.RetailTypeID           ,0 CompAccountTypeID           ,SUM(rev.RevenueAmt) RevenueAmt
        FROM 
           r2d2_dba.ERP_MONTHLY_REVENUE  rev
        GROUP BY
            rev.AmountType           ,rev.CurrencyID           ,rev.PERIOD           ,rev.ServiceTierID           ,rev.CountryID           ,rev.BillingRate           ,rev.BillingPartnerID           ,rev.RetailTypeID           ,rev.StateDt           ,rev.SignupDt    )  emr
   JOIN r2d2_dba.bucket_x_bucket_group bxbg                             ON bxbg.BucketGroupID = 3330
   WHERE emr.CountryID=230              AND emr.PERIOD = '2018-03'    
   AND (
                   ((CountryID=230) AND (ServiceTierID=5) AND (BillingPartnerID=1146950375) AND (RetailTypeID=2) AND (BillingRate=5)) 
                OR ((CountryID=230) AND (ServiceTierID=5) AND (BillingPartnerID=1146950375) AND (RetailTypeID=1) AND (CompAccountTypeID<>3))
                OR ((ServiceTierID=5) AND (BillingPartnerID=1146950375) AND (RetailTypeID=2) AND (BillingRate<>5) AND (CompAccountTypeID<>3))  
    )

Here is one step in the explain plan:

 

4) We do an all-AMPs SUM step to aggregate from
ROYALTY_TGT.ERP_MONTHLY_REVENUE by way of an all-rows scan with a
condition of ("(ROYALTY_TGT.ERP_MONTHLY_REVENUE.CountryID = 230)
AND ((ROYALTY_TGT.ERP_MONTHLY_REVENUE.ServiceTierID = 5) AND
((ROYALTY_TGT.ERP_MONTHLY_REVENUE.BillingPartnerID = 1146950375)
AND ((ROYALTY_TGT.ERP_MONTHLY_REVENUE.RetailTypeID = 2) AND
((ROYALTY_TGT.ERP_MONTHLY_REVENUE.BillingRate = 5) AND
(ROYALTY_TGT.ERP_MONTHLY_REVENUE.Period = '2018-03')))))")
, grouping by field1 ( ROYALTY_TGT.ERP_MONTHLY_REVENUE.AmountType
,ROYALTY_TGT.ERP_MONTHLY_REVENUE.CurrencyID
,ROYALTY_TGT.ERP_MONTHLY_REVENUE.Period
,ROYALTY_TGT.ERP_MONTHLY_REVENUE.ServiceTierID
,ROYALTY_TGT.ERP_MONTHLY_REVENUE.CountryID
,ROYALTY_TGT.ERP_MONTHLY_REVENUE.BillingRate
,ROYALTY_TGT.ERP_MONTHLY_REVENUE.BillingPartnerID
,ROYALTY_TGT.ERP_MONTHLY_REVENUE.RetailTypeID
,ROYALTY_TGT.ERP_MONTHLY_REVENUE.StateDt
,ROYALTY_TGT.ERP_MONTHLY_REVENUE.SignupDt). Aggregate
Intermediate Results are computed globally, then placed in Spool 3.
The size of Spool 3 is estimated with no confidence to be 8,407
rows (1,630,958 bytes). The estimated time for this step is 0.04
seconds. 

 

Note that in the query, one of the OR'd criteria includes "BillingRate<>5", but in the AGGREGATE step above, the condition only includes results where BillingRate = 5.

 

Interestingly, if I alter the query in a couple ways, it works OK:

1) remove the JOIN to bucket_x_bucket_group

2) remove the second "OR" part of the WHERE clause

 

I have ways to work around this, but it does look like a bug to me. Is it a known bug?

5 REPLIES
Enthusiast

Re: Apparent bug in query plan

Here is a somewhat simplified query that still shows the problem:

 

SyntaxEditor Code Snippet

     SELECT * FROM      (
            SELECT
           rev.PERIOD           ,rev.BillingRate           ,rev.BillingPartnerID           ,rev.RetailTypeID           ,0 CompAccountTypeID           ,SUM(rev.RevenueAmt) RevenueAmt
        FROM 
           r2d2_dba.ERP_MONTHLY_REVENUE  rev
        GROUP BY
           rev.PERIOD           ,rev.BillingRate           ,rev.BillingPartnerID           ,rev.RetailTypeID    )     emr
      JOIN r2d2_dba.bucket_x_bucket_group bxbg                             ON bxbg.BucketGroupID = 3330
   WHERE 
       emr.PERIOD = '2018-03'    
    AND BillingPartnerID=1146950375
   AND (
                   (RetailTypeID=2 AND BillingRate=5) 
                OR (1=1 AND  RetailTypeID=1 AND CompAccountTypeID<>3)
                OR ( RetailTypeID=2 AND BillingRate<>5 AND CompAccountTypeID<>3)
    )

 ... and the part of the query plan:

 

4) We do an all-AMPs SUM step to aggregate from
ROYALTY_TGT.ERP_MONTHLY_REVENUE by way of an all-rows scan with a
condition of ("(ROYALTY_TGT.ERP_MONTHLY_REVENUE.BillingPartnerID =
1146950375) AND ((ROYALTY_TGT.ERP_MONTHLY_REVENUE.RetailTypeID = 2)
AND ((ROYALTY_TGT.ERP_MONTHLY_REVENUE.BillingRate = 5) AND
(ROYALTY_TGT.ERP_MONTHLY_REVENUE.Period = '2018-03')))")
, grouping by field1 ( ROYALTY_TGT.ERP_MONTHLY_REVENUE.Period
,ROYALTY_TGT.ERP_MONTHLY_REVENUE.BillingRate
,ROYALTY_TGT.ERP_MONTHLY_REVENUE.BillingPartnerID
,ROYALTY_TGT.ERP_MONTHLY_REVENUE.RetailTypeID). Aggregate
Intermediate Results are computed globally, then placed in Spool 3.
The size of Spool 3 is estimated with no confidence to be 1 row (
72 bytes). The estimated time for this step is 0.04 seconds.

 

I see that it's applying the first of the 3 OR'd conditions, but not the other 2.

Teradata Employee

Re: Apparent bug in query plan

The optimizer is pretty good at simplifying logic in predicates, especially if good stats have been collected.  It might help if you could explain what you are trying to do.  I see a JOIN but no join conditions, which implies a product join, but then you are not selecting anything from bucket_x_bucket_group, so what is the point?  Also, the conditions you are looking at could be simplified.  What I see is:

 

Select ...
CROSS JOIN r2d2_dba.bucket_x_bucket_group bxbg
WHERE bxbg.BucketGroupID = 3330
 AND  emr.CountryID=230
 AND  emr.PERIOD = '2018-03'
 AND  ServiceTierID=5
 AND  BillingPartnerID=1146950375
 AND (
    ( CountryID=230  AND  RetailTypeID=2 AND  BillingRate=5 )
 OR ( CountryID=230  AND  RetailTypeID=1 AND  CompAccountTypeID<>3 )
 OR ( RetailTypeID=2 AND  BillingRate<>5 AND  CompAccountTypeID<>3 )
)

Enthusiast

Re: Apparent bug in query plan

In the original query (which I didn't post) it does retrieve another column from the crossjoined table bxbg. (Basically we have things called "buckets", and "bucket groups" which contain one or more buckets, and for every result from the main table, we want a result row for each bucket in the requested bucketgroup.)

 

The query was made dynamically by an engine we have that assembles the where clause out of groups of rules.

 

You mentioned statistics. Is it possible that out-of-date or missing statistics could cause the optimizer to eliminate rows when it simplifies the logic? In this case, if the stats show 0 rows with BillingRate <> 5, could it remove that from logic? I will see what's up with the statistics on our table.

Teradata Employee

Re: Apparent bug in query plan

Well, it wouldn't (shouldn't) assume there are no rows with BillingRate<>5 just because the statistics say so. It would be more subtle, such as when a referential constraint is defined (something like a foreign key but without an index).  It sure looks like something is missing.  If you can rewrite the query in a way that gets the expected Explain plan and produces a different result set, then you should open an incident with the support center.  (That's usually the only way to know for sure if you are seeing a known problem.)

Enthusiast

Re: Apparent bug in query plan

Thanks GJ. I'll be opening an incident.