Implicit and explicit castings results in a bad plan

Database

Implicit and explicit castings results in a bad plan

Hi.

I've created a very small table to illustrate the issue, I'ld like to hear opinions about.

CREATE MULTISET TABLE Table_1

,NO FALLBACK ,

NO BEFORE JOURNAL,

NO AFTER JOURNAL,

CHECKSUM = DEFAULT,

DEFAULT MERGEBLOCKRATIO

(

Textfield CHAR(20) CHARACTER SET LATIN NOT CASESPECIFIC

)

PRIMARY INDEX ( Textfield );

;

I've inserted 10343 records in the table - in total 340 unique values and 381 instances of the value '00001'.

Valueexamples: '00001','02356','47027' etc.

Then I collect stats:  

COLLECT STATS ON Table_1 COLUMN (Textfield)

Then some different ways to pick data:  

 

Implicit casting should be performed in the first one, but the Plan calculates the size of the spool to be 6 rows with high confidence. Therefor the query performs catastrophic (not in this little query but in more complicated queries performed on larger tables):

SEL * FROM Table_1

WHERE Textfield = 1

---------

 

This one works fine because casting is not necessary. The size of the spool is correctly calculated to be 381 rows, and the query performs fine.

SEL * FROM Table_1

WHERE Textfield = '00001'

--------

 

This one is going wrong to - probably because of the explicit casting - and estimes again the size of the spool to be 6 rows with high confidence with a very poor performance as a result.

SEL * FROM Table_1

WHERE CAST(Textfield AS INTEGER) = 1

 

All the queries returns the correct result, but there is a very big difference in performance. I would like to know, if You think that this might be an issue with the Optimizer?  Personally I have the opinion that the Optimizer should be able to handle the casting better than this.

I am not sure, but it seems that this issue has evolved lately - perhaps after Our latest update of the system to 13.10 from 12.  

Is this a known issue?

We are running VERSION 13.10.06.01 - RELEASE 13.10.06.01

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

4 REPLIES
Teradata Employee

Re: Implicit and explicit castings results in a bad plan

Hi, 

the castings - both implicit and explicit - are consuming CPU that is not necessary.  Therefore, it is a good idea to design the physical database model to minimize the castings.    

In the example above - if the field really contains only integers, then the data type should be defined as integer.  If only some values are integer (only for certain rows), then the data modeller might want to add an additional integer field for that data.

Regards,

Vlad.

Re: Implicit and explicit castings results in a bad plan

Hi Vlad.

Thank You for Your time and Your answer. I'm well aware that minimizing the use of castings is a good thing :-).

What I'm trying to say, is that the casting itself results in a very bad execution plan bexause the optimizer doesen't calculate the right number of rows.

I should mean that the example:

SEL * FROM Table_1

WHERE CAST(Textfield AS INTEGER) = 1

is an example of a query that might run in many systems around the world. In the former versions of Teradata this query performed very well, but now it doesen't. I'lld like to know if anyone else lives the same experience with this or if this is a local issue in our system.

Regards Troels

N/A

Re: Implicit and explicit castings results in a bad plan

Hi Troels,

i doubt that it performed better before the upgrade, all TD release ever (and probably all other DBMSes, too) show a similar behaviour. You simply can't get a good plan for queries like that.

Implicit casting should be performed in the first one

Well, there's an implicit cast, you'll see it when you check explain:

  3) We do an all-AMPs RETRIEVE step from Table_1 by

     way of an all-rows scan with a condition of (

     "(TERADATA_EDUCATION.Table_1.Textfield (FLOAT, FORMAT

     '-9.99999999999999E-999'))= 1.00000000000000E 000") into Spool 1

     (group_amps), which is built locally on the AMP



When you compare numeric and char values the char must be converted to a numeric value (using the most flexible numeric format, FLOAT) and not vice versa, because the numeric value 1 might be represented as a string in lots of different ways,e.g '1', '1.0', ' 1', etc.

Would you expect that the optimizer casts 1 automagically to '00001'?

When a typecast like this is applied to a column the optimizer can't use the existing statistics anymore, because it's different datatypes. Additionally it also can't use an existing index on that column.

The "high confidence" is due to accessing the PI of the table, otherwise it would be "no confidence".

This one is going wrong to - probably because of the explicit casting - and estimes again the size of the spool to be 6 rows with high confidence with a very poor performance as a result.

SEL * FROM Table_1

WHERE CAST(Textfield AS INTEGER) = 1

  3) We do an all-AMPs RETRIEVE step from Table_1 by

     way of an all-rows scan with a condition of (

     "(TRANSLATE((TERADATA_EDUCATION.Table_1.Textfield )USING

     LATIN_TO_UNICODE)(INTEGER, FORMAT '-(10)9'))= 1") into Spool 1

     (group_amps), which is built locally on the AMPs.


Same problem as before.

Bottom line: Know your datatypes and use them accordingly :-)

Btw, i've seen similar problems when the datatype in a column changed in the datamodel and end users were not notified.

Dieter

Re: Implicit and explicit castings results in a bad plan

Hi Dieter.

Okay. Thank You for Your explanation. I get it :-)

Regards Troels