Performance of view containing a Union clause

Database
Enthusiast

Performance of view containing a Union clause

Hi,
I have 2 annual tables each with the same columns and the same unique primary index and stats collected.

eg. Replace View VIEWDB.X As
Locking $Star_Ship.Star_BL_Item for access
Locking $Star_Hist_04.Star_BL_Item_04 for access
Select col1 (named col1),
col2 (named col2)
from $Star_Ship.Star_BL_Item
Union
Select col1,
col2
from $Star_Hist_04.Star_BL_Item_04;

A select * from ViewDB.X, or select count(*) from ViewDB.X is extremely slow most likely due to the redistribution, my question is should there be redistribution or am I doing something incorrectly in building the view, see expalin output below?

I basically want all the rows from both tables eliminating duplicates.

Explanation
1) First, we lock $Star_Hist_04.Star_Bl_Item_04 for access, and we
lock $STAR_Ship.STAR_BL_Item for access.
2) Next, we do an all-AMPs RETRIEVE step from $STAR_Ship.STAR_BL_Item
by way of an all-rows scan with no residual conditions into Spool
1 (all_amps), which is redistributed by hash code to all AMPs.
The input table will not be cached in memory, but it is eligible
for synchronized scanning. The result spool file will not be
cached in memory. The size of Spool 1 is estimated with high
confidence to be 3,474,353 rows. The estimated time for this step
is 56.51 seconds.
3) We do an all-AMPs RETRIEVE step from $STAR_Hist_04.STAR_BL_Item_04
by way of an all-rows scan with no residual conditions into Spool
1 (all_amps), which is redistributed by hash code to all AMPs.
Then we do a SORT to order Spool 1 by the sort key in spool field1
eliminating duplicate rows. The input table will not be cached in
memory, but it is eligible for synchronized scanning. The result
spool file will not be cached in memory. The size of Spool 1 is
estimated with high confidence to be 1,745,720 rows. The
estimated time for this step is 17.09 seconds.
4) We do an all-AMPs RETRIEVE step from Spool 1 (Last Use) by way of
an all-rows scan into Spool 2 (group_amps), which is built locally
on the AMPs. The result spool file will not be cached in memory.
The size of Spool 2 is estimated with high confidence to be
4,778,903 rows. The estimated time for this step is 58.45 seconds.
5) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 2 are sent back to the user as the result of
statement 1. The total estimated time is 2 minutes and 12 seconds.

8 REPLIES
Enthusiast

Re: Performance of view containing a Union clause

Well I add the ALL qualifier to the Union clause and performance is significantly improved, I suppose the redistribution was to eliminate the duplicates.
Enthusiast

Re: Performance of view containing a Union clause

Yes, that is the case. If you know that you don't have duplicates between the two queries, it's best to include the ALL.
Enthusiast

Re: Performance of view containing a Union clause

I am doing something very similar (CURRENT data in one table and HIST data in another). Is there anyway to tell the system that the data is split by a given column's value? The column it is split on is not the primary index. I am looking for something similar to the concept of partitioned tables in MSSQL.

R
Enthusiast

Re: Performance of view containing a Union clause

Ron,

Do you mean that you are keeping a view that unions the data between your CURRENT table and your HIST table or do you want to combine your CURRENT and HIST table into one table?

You can have a partitioned table (PPI) in Teradata whereby you partition on a date or some other field.

I can't tell by your question what you are trying to do.

Barry
Enthusiast

Re: Performance of view containing a Union clause

Your first scenario is the correct one. Can you tell me where in the documentation I could find out how to use partitioned tables. Or if possible, show me an example.

R
Enthusiast

Re: Performance of view containing a Union clause

This would be a quick example of a PPI if you had a CURR_HIST_IND on your combined table:

CREATE TABLE mytable
(CURR_HIST_IND CHAR(1)
...other columns)
PRIMARY INDEX(?????)
PARTITION BY CASE_N (
CURR_HIST_IND = 'C',
CURR_HIST_IND = 'H',
NO CASE OR UNKNOWN);

Then, if you query had a constraint that said "CURR_HIST_IND = 'C'", the optimizer would only look at the currrent partition. Likewise, if your query said "CURR_HIST_IND = 'H'", it would only look at the history partition.

I would recommend that you read the Orange Book by Jerry Klindt and Paul Sinclair on PPI.
Enthusiast

Re: Performance of view containing a Union clause

Thank you. I will read it right away.

R
Enthusiast

Re: Performance of view containing a Union clause

union will eliminate the duplicate rows and redistribution
union all won't,
that's the difference