Best Performance with or condition or two queries and union all?

Database
The Teradata Database channel includes discussions around advanced Teradata features such as high-performance parallel database technology, the optimizer, mixed workload management solutions, and other related technologies.
Enthusiast

Best Performance with or condition or two queries and union all?

Hi everyone, 

I am trying to analyze the performance of a query and I am curious about how to improve the performance.

I have a PTI table like this:

CREATE MULTISET TABLE <my_database>.<My_PTI_Table> ,NO FALLBACK ,
     NO BEFORE JOURNAL,
     NO AFTER JOURNAL,
     CHECKSUM = DEFAULT,
     DEFAULT MERGEBLOCKRATIO
     (
      TD_TIMEBUCKET BIGINT NOT NULL GENERATED SYSTEM TIMECOLUMN,
      TD_TIMECODE TIMESTAMP(6) NOT NULL GENERATED TIMECOLUMN,
      Meter_ID INTEGER NOT NULL,
      Meter_Reading_Type_CD VARCHAR(50) CHARACTER SET LATIN CASESPECIFIC NOT NULL,
      Reading_Meas DECIMAL(18,4),
      Reading_Reason_CD VARCHAR(50) CHARACTER SET LATIN CASESPECIFIC,
      is_meas BYTEINT DEFAULT 0,
      created_on TIMESTAMP(6))
 PRIMARY TIME INDEX (TIMESTAMP(6), DATE '2019-01-01', MINUTES(15), COLUMNS(Meter_ID,Meter_Reading_Type_CD), NONSEQUENCED);

I wrote two different queries that retrieve the same result:

Query A - With or condition

select Reading_Meas, td_gettimebucket(td_timecode) as td_timebucket
from <my_database>.<My_PTI_Table>
where Meter_Reading_Type_CD = '1-0:1.8.0*255'
and Meter_Id = 100115
and (td_gettimebucket(td_timecode)=18660 or td_gettimebucket(td_timecode)=18667);

Query B - With Union All

select METER_READING_1m.Reading_Meas, td_gettimebucket(td_timecode) as td_timebucket
from <my_database>.<My_PTI_Table>
where Meter_Reading_Type_CD = '1-0:1.8.0*255'
and Meter_Id = 100115
and td_gettimebucket(td_timecode)=18660

UNION ALL

select METER_READING_1m.Reading_Meas, td_gettimebucket(td_timecode) as td_timebucket
from <my_database>.<My_PTI_Table>
where Meter_Reading_Type_CD = '1-0:1.8.0*255'
and Meter_Id = 100115
and td_gettimebucket(td_timecode)=18667;

I run the queries with explain and they do almost the same (I summarized the steps and ommit some details):

StepQuery AQuery B
1single-AMP RETRIEVE step from all partitions     
       by way of the primary index
Single AMP retrieve from all partitions by way of primary index with a residual condition of (                                    
      "Meter_Reading_Type_CD = '1-0:1.8.0*255'")
2single-AMP RETRIEVE step from all partitions     
       by way of the primary index
Single AMP retrieve from all partitions by way of primary index with a residual condition of (                                    
      "Meter_Reading_Type_CD = '1-0:1.8.0*255'")
3group-AMP SORTgroup-AMP SORT
4group-AMP RETRIEVEsend out an END TRANSACTION step to all AMPs
5send out an END TRANSACTION step to all AMPs 

 

From the results in TD Studio the query with the "union all" seems to be faster but it would be very helpful if someone could give me some hints how to determine the better performance base on the query explanation.

I could see that both queries are using the PTI and the Query B has a residual condition after each single AMP retrieve while the query A makes a group-AMP RETRIEVE as an additional step.

Thanks in advance and kind regards, 

Paul

Kind regards,
Paul Hernandez
Tags (3)
6 REPLIES 6
Teradata Employee

Re: Best Performance with or condition or two queries and union all?

Hi Paul,

 

What about this query ?

select Reading_Meas, td_gettimebucket(td_timecode) as td_timebucket
  from My_PTI_Table
 where Meter_Reading_Type_CD = '1-0:1.8.0*255'
   and Meter_Id              = 100115
   and td_timecode          in (timestamp '2019-07-14 10:45:00.000000', timestamp '2019-07-14 12:30:00.000000');

In both your queries explain plan I see two single-AMP (good) all partition (not good) steps.

The good thing about the PTI is you don't have to think about the bucket, just work with natural datatypes.

 

Also, the PTI the way you build have datas on same amp for same (Meter_Id, Meter_Reading_Type_CD, 15 minutes range).

If you build your table using CAL_DAYS(1) instead of MINUTES(15), the explain plan is even better (one step, one amp, one partition).

 

But of course there is the reality of your needs to be accounted for.

 

Highlighted
Enthusiast

Re: Best Performance with or condition or two queries and union all?

Hi

I am also trying to determine which of the below 2 queries would work better when I have millions of records in my base table.

Have created a sample table and am trying to determine the performance of queries based on the explain plan but seems like both queries provide same plan.

create table testperf
(
a integer
) primary index (a);

insert into testperf values (1);
insert into testperf values (2);

insert into testperf values (3);

insert into testperf values (4);

insert into testperf values (5);

collect stats on testperf column (a);
sel * from testperf where a in (1,2,3,4,5) -- Query 1


sel * from testperf where a in (1)
union all
sel * from testperf where a in (2)
union all
sel * from testperf where a in (3)
union all
sel * from testperf where a in (4)
union all
sel * from testperf where a in (5)  -- Query 2

 

Which query would give better performance when the volumes are huge? Will Teradata spawn parallel threads to work on different select statements in Union query and give better performance?

Any suggestions would be much appreciated.

 

Thanks

Teradata Employee

Re: Best Performance with or condition or two queries and union all?

Considering a is your PI, those queries are mathematicaly identical. They share the same explain plan.

So don't bother writing all those union all.

 

Sometime a OR condition can be efficienlty replaced by a UNION operator, but not when you're filtering on the PI, which is the fastest way to retrieve datas from a table.

Ambassador

Re: Best Performance with or condition or two queries and union all?

You should go with query #1, when you access via the PI the optimizer might split the IN-list into seperate steps using the Primary Index (similar to UNION) as long as it's considered more efficient than a Full Table Scan.

 

Btw, with just a few rows you might get a totally different plan, so better test on a large table with a similar PI :-)

Enthusiast

Re: Best Performance with or condition or two queries and union all?

Thanks all the for the responses.

 

Now, If instead of a PI column, I have an another column in the Where clause, which query would perform better? OR or UNION?

 

 

Teradata Employee

Re: Best Performance with or condition or two queries and union all?

Well, it depends if you have additionnal structures to speed parts of the query.

Let's say you have a table, primary index col1, partitionned by col2, secondary index col3, and regular columns col4 and col5.

 

If you have a query like :

 

 select * from mytable where col2 = <something> or col3 = <somewhat>;

Check the explain plan if Teradata doesn't convert to an UNION on its own, but a UNION here make sense.

 

 

The same query with the other columns :

select * from mytable where col4 = <something> or col5 = <somewhat>;

You have no benefits of doing two table scans + a distinct over one table scan. So here the OR should stay an OR.