Performance Tuning

Database
Enthusiast

Performance Tuning

Hi,

The below query takes around 30-35 minutes to store results in a temporary table. 

SELECT c.call_start_dt ,

       'xrx' AS source ,

       CASE

              WHEN c.postpaid_seq IS NOT NULL THEN 'X'

              WHEN c.prepaid_seq IS NOT NULL THEN 'Y'

       END AS source_system ,

       CASE

              WHEN c.prepaid_seq IS NOT NULL THEN c.postpaid_seq

              ELSE c.prepaid_seq

       END

FROM   vw_xerox.call_details c --pointing to call_details table (vw_xerox.call_details is a 1:1 view)

WHERE  c.call_start_dt BETWEEN date '2015-03-24'-120 AND    (

              cast('2015-07-24' AS date) + 31)

       /* The table call_details is a partition table for each day

and holds  months data only. The volume of this table is quite high ( total rows = 7923659811 rows) partitin is done based on call start date

and PI is on call start date and call start time*/


AND    (

              c.st_ind <> 'Q'

       AND    (

                     src_system,src_batch_nr) IN

              (

                     SELECT src_system,

                            src_batch_nr

                     FROM   xer.xerox_load_status_hist --Returns around 3999 rows

                     WHERE  src_system IN ('X',

                                           'Y')

                     AND    ((

                                          run_dt <= date '2015-07-24'

                                   AND    process_dt IS NOT NULL)

                            OR     (

                                          active_ind = 'Y'

                                   AND    run_dt = cast('2015-07-24' AS date)

                                   AND    process_dt IS NULL)) )

       AND    (

                     src_system,src_batch_nr, ins_id,upd_id) NOT IN

              /* 2. performance bottleneck also observed here */

              (

                     SELECT src_system,

                            src_batch_nr,

                            ins_id,

                            upd_id

                     FROM   xer.xerox_loaded_usg

                     WHERE  src_system IN ('X',

                                           'Y') --Returns around 3400 rows

              ) ) ;

/* In above script column and table name are replaced by some other names */

Is there a way to tune this SQL? Even the insert from call details table for 4 months data into temp table takes significant amount of time!!

and (src_system,src_batch_nr, ins_id,upd_id) not in /* 2. performance bottleneck observed here */

            (

                select src_system,src_batch_nr,  ins_id,upd_id from xer.xerox_loaded_usg               

                where src_system in ('X','Y') --Returns around 3400 rows

            )

);

For above part alternative approach can be - delete those records from temporary table which have a match found in xerox_loaded_usg for the combination(src_system,src_batch_nr, ins_id,upd_id) - this might be a costly operation as we are doing delete from a big table for some combinations.

Would splitting the resultset into small chunks help to make use of the day wise partition?? Thanks!

8 REPLIES
Junior Contributor

Re: Performance Tuning

#1: Change the NOT IN to a NOT EXISTS:

and NOT EXISTS
(
select *
from xer.xerox_loaded_usg AS u
where src_system in ('X','Y')
AND c.src_system = u.src_system
AND c.src_batch_nr = u.src_batch_nr
AND c.ins_id = u.ins_id
AND c.upd_id = .upd_id
)
;

#2: Can you show the Explain and (P)PI of all tables including the target table for the insert/select?

Enthusiast

Re: Performance Tuning

Thanks for your response.

I tried the approach you mentioned. But did not gain any improvement. c.src_system , c.src_batch_nr are  derived columns and hence should be src_system & src_batch_nr.

The plan for a given timeframe was given in previous link:

http://forums.teradata.com/forum/general/tuning-query

Here timeframe would be between CURRENT_DATE - 120 and CURRENT_DATE +31.

xerox_loaded_usg table have PI on src_system.

 xer.xerox_loaded_usg have PI on   ins_id,src_batch_nr,upd_id !!!! -> This can also be a problem.

Thanks!

Enthusiast

Re: Performance Tuning

( Call_Start_Dt ,Call_Start_Tms ,originating_num ,Terminating_Num ,imei_nbr ,Seq_Nr ,Prepaid_Seq ,

postpaid_Seq ,I_Seq ) is the PI on target table and is a multiset table. Thanks!

Junior Contributor

Re: Performance Tuning

You don't have any matching PIs and estimated time is close to actual.

Check QryLogStepsV for the actual vs. estimated numbers and skew.

Enthusiast

Re: Performance Tuning

Yes, there is no matching PIs.

Unfortunately, I don't have access to the tables you mentioned.

Not sure if the below steps that I have choosen may be correct. Need your suggestion on this

1. Populate the data for each day ( to make use of partition and for an optimal plan) for the given timeframe to a temporary table say T1. INSERT for such a big timeframe is taking too much time.

I do understand that BULK INSERT is much faster than INSERT thru loop. I am trying to make use of partition and for an optimal plan. I may be wrong here but may be if spool space is less than it may have an impact.

Also can append the IN (or Exist) part of the SQL.

The call_details table is a BIG table. It takes around 380 GB of space for  6 months of data. Or, would have preferred CT and then drop partition for date greater than Upper bound value for the date ( less than lower bound value of the date ) and append rows as required or simply delete rows to make the resultset that

have the data for the timeframe

 c.call_start_dt BETWEEN date '2015-03-24'- 120 AND    (

              cast('2015-07-24' AS date) + 31)

I was also thinking to do a join instead of IN. But in that case, we are joining one Partition table with a non partition table. And to my understanding the joining key  of the partition table will be SORTed first which is an

overhead. PLease correct me if I am WRONG here.

2. Delete the data from T1 where c.st_ind = 'Q'.This way we can replace it by = instead of <>. But again deleting data from a huge table with a filter predicate may have performance overhead.

3. Delete the data from T1 which have a match found in xerox_loaded_usg for the combination(src_system,src_batch_nr, ins_id,upd_id) - this might be a costly operation as we are doing delete from a big table for some combinations. But this way we can avoid NOT IN  and can replace by IN.Also,

the  tables have no matching PIs!

This is one of the approach that I can think of at this point of time.

Is there any other approach I am missing out? Your valuable suggestions will surely help.Thanks once again for giving your valuable time.

Junior Contributor

Re: Performance Tuning

IN will be rewritten to a JOIN anyway and the optimizer decides if/what/when to sort.

But before you try to change your approach you should find out why/which step is performing bad.

For that you need data from DBQL, you should contact your DBA and ask for it.

Btw, what's your system type, 1xxx, 2xxx or 6xxx?

It's a 120 AMP system?

Enthusiast

Re: Performance Tuning

Thanks once again for the correction. Yes indeed, it would be replaced  by JOIN.

It's a 120 AMP system. I need to get the information on system type and will get back to you.

Well, I tried incremental approach to understand the performance one step at a time. Even with applying ONLY call start date predicate for such a big timeframe, it takes significant amount of time.Note that the stats are up to date.So, only part remaining are IN, NOT IN and '<>'.

It has been observed that for even a single day when the NOT IN part is appended and when data volume is high than it takes more time. Hence, the conclusion is that the NOT IN part  of the sql have the performance bottleneck. Hence, the above approach came to my mind. Not really sure if that works as I have no prod like setup to observe the performance.

Anyways, I will try to validate it by looking at the tablename you mentioned by the help of DBA.

I have a question here - Why you said that the tables should have matching PI? If I am correct, joins of partitioned tables to unpartitioned tables with the same primary index have impact on performance. It results in performance degrade.

Well, I am more an Oracle guy and new to Teradata tuning. It seems that the journey gonna be interesting one :)

Thanks for all the valuable suggestion and guidance.

Junior Contributor

Re: Performance Tuning

NOT IN might result in a overly complicated plan, but never when using NOT EXISTS. 

Of course joining partitioned to non-partitioned tables on their PI is slower than joining tables with same PI & partitioning. But it's still faster than joining tables on non-matching PIs.