UNION ALL in a View Enhancement

Blog
The best minds from Teradata, our partners, and customers blog about whatever takes their fancy.
Teradata Employee

An enhancement to the Teradata optimizer in Teradata Database 16.0 will improve the performance of some queries that use a view or a derived table that contains a UNION ALL between multiple branches. A branch can correspond to a single table or a query result (for example, an aggregation on a table or a join between tables). Assume for example that you have a large amount of sales data residing in two different tables: A Current_Sales table holds all the current sales rows, and a History_Sales table holds the older sales rows. If you create a UNION ALL between both tables and place that in a view, then queries that need data from both tables can simply reference the view.

 

Prior to this enhancement when you accessed sales data using such a view and you wanted to join the sales data to another table, first a spool would have to be built that includes rows from both branches of the UNION ALL. That common spool would then be redistributed (or otherwise prepared) in order to be joined to the 3rd table in the query.  Any reduction to the sales data temporary data set would not be applied until after the Current_Sales and History_Sales tables had been union-ed.

 

With this enhancement, the optimizer may choose to push the join of the 3rd table into each branch of the UNION ALL first, join the pushed table with the table in first branch and with the table in the second branch, and then combine the two join results in a combined spool.  This  approach could save considerable resources when large tables comprise a UNION ALL view (or derived table) and the join is effective in reducing down the resulting spool file.

 

In a similar way aggregations can be pushed into the branches of the UNION ALL as well based on this enhancement. Here is an explain showing the step that performs an aggregation.  Note that the aggregation is performed on each table that participates in the UNION ALL independently and then a global aggregation is done within the same step.  Prior to this UNION ALL enhancement the aggregation would be performed after the UNION ALL had been done, after the rows from all branches of the UNION had been spooled. In the example below the only spool file written is after each table in the UNION has been aggregated separately.

 

SELECT sold_date, SUM(sales_price)

FROM All_Sales_V

GROUP BY sold_date;

 

5) We do an all-AMPs SUM step in TD_MAP1 to aggregate from

     CAB.Store_Sales in view All_Sales_V by way of an all-rows scan

     with no residual conditions, grouping by field1 (

     CAB.Store_Sales.ss_sold_date). Aggregate Intermediate

     Results are computed globally, then placed in Spool 10 in TD_Map1.

     The size of Spool 10 is estimated with high confidence to be 1,824

     rows (52,896 bytes). The estimated time for this step is 25.92

     seconds.

6) We execute the following steps in parallel.

       1) We do an all-AMPs RETRIEVE step in TD_Map1 from Spool 10

         (Last Use) by way of an all-rows scan into Spool 1 (all_amps),

         which is built locally on the AMPs. The size of Spool 1 is

         estimated with high confidence to be 1,824 rows (60,192

         bytes). The estimated time for this step is 0.17 seconds.

       2) We do an all-AMPs SUM step in TD_MAP1 to aggregate from

         CAB.Catalog_Sales in view All_Sales_V by way of an all-rows

         scan with no residual conditions, grouping by field1 (

         CAB.Catalog_Sales.cs_sold_date). Aggregate Intermediate

         Results are computed globally, then placed in Spool 13. The

         size of Spool 13 is estimated with high confidence to be

         1,831 rows (53,099 bytes). The estimated time for this step

         is 16.58 seconds.

7) We execute the following steps in parallel.

       1) We do an all-AMPs RETRIEVE step in TD_Map1 from Spool 13

         (Last Use) by way of an all-rows scan into Spool 1 (all_amps),

         which is built locally on the AMPs. The size of Spool 1 is

         estimated with high confidence to be 3,655 rows (120,615

         bytes). The estimated time for this step is 0.17 seconds.

       2) We do an all-AMPs SUM step in TD_MAP1 to aggregate from

         CAB.Web_Sales in view All_Sales_V by way of an all-rows scan

         with no residual conditions, grouping by field1 (

         CAB.Web_Sales.ws_sold_date). Aggregate Intermediate

         Results are computed globally, then placed in Spool 16 in

         TD_Map1. The size of Spool 16 is estimated with high

         confidence to be 1,824 rows (52,896 bytes). The estimated

         time for this step is 8.49 seconds.

8) We do an all-AMPs RETRIEVE step in TD_Map1 from Spool 16 (Last

     Use) by way of an all-rows scan into Spool 1 (all_amps), which is

     built locally on the AMPs. The size of Spool 1 is estimated with

     high confidence to be 5,479 rows (180,807 bytes). The estimated

     time for this step is 0.17 seconds.

9) We do an all-AMPs SUM step in TD_Map1 to aggregate from Spool 1

     (Last Use) by way of an all-rows scan, grouping by field1 (

     CAB.Store_Sales.ss_sold_date). Aggregate Intermediate

     Results are computed globally, then placed in Spool 19. The size

     of Spool 19 is estimated with low confidence to be 3,654 rows (

     105,966 bytes). The estimated time for this step is 0.30 seconds.

10) We do an all-AMPs RETRIEVE step in TD_Map1 from Spool 19 (Last

     Use) by way of an all-rows scan into Spool 2 (group_amps), which

     is built locally on the AMPs. The size of Spool 2 is estimated

     with low confidence to be 3,654 rows (120,582 bytes). The

     estimated time for this step is 0.17 seconds.

 

The enhancement also optimizes the select count(*) from a view that contains a UNION ALL. Prior to the enhancement all rows from all branches of the UNION would be spooled, then counted.  With the enhancement the cylinder index of each table is read and then summed, a very quick operation.  Below is the explain step that performs the select count(*) operation on a view:

 

SELECT COUNT(*) FROM All_Sales_V;

 

5) We do an all-AMPs SUM step in TD_MAP1 to aggregate from

     CAB.Store_Sales in view All_Sales_V by way of a cylinder index

     scan with no residual conditions. Aggregate Intermediate Results

     are computed globally, then placed in Spool 10. The size of Spool

     10 is estimated with high confidence to be 1 row (23 bytes). The

     estimated time for this step is 0.61 seconds.

6) We execute the following steps in parallel.

       1) We do an all-AMPs RETRIEVE step in TD_Map1 from Spool 10

         (Last Use) by way of an all-rows scan into Spool 1 (all_amps),

         which is built locally on the AMPs. The size of Spool 1 is

         estimated with high confidence to be 1 row (29 bytes). The

         estimated time for this step is 0.03 seconds.

       2) We do an all-AMPs SUM step in TD_MAP1 to aggregate from

         CAB.Catalog_Sales in view All_Sales_V by way of a cylinder

         index scan with no residual conditions. Aggregate

         Intermediate Results are computed globally, then placed in

         Spool 13. The size of Spool 13 is estimated with high

         confidence to be 1 row (23 bytes). The estimated time for

         this step is 0.47 seconds.

7) We execute the following steps in parallel.

       1) We do an all-AMPs RETRIEVE step in TD_Map1 from Spool 13

         (Last Use) by way of an all-rows scan into Spool 1 (all_amps),

         which is built locally on the AMPs. The size of Spool 1 is

         estimated with high confidence to be 2 rows (58 bytes). The

         estimated time for this step is 0.03 seconds.

       2) We do an all-AMPs SUM step in TD_MAP1 to aggregate from

         CAB.Web_Sales in view All_Sales_V by way of a cylinder index

         scan with no residual conditions. Aggregate Intermediate

         Results are computed globally, then placed in Spool 16. The

         size of Spool 16 is estimated with high confidence to be 1

         row (23 bytes). The estimated time for this step is 0.32

         seconds.

8) We do an all-AMPs RETRIEVE step in TD_Map1 from Spool 16 (Last

     Use) by way of an all-rows scan into Spool 1 (all_amps), which is

     built locally on the AMPs. The size of Spool 1 is estimated with

     high confidence to be 3 rows (87 bytes). The estimated time for

     this step is 0.03 seconds.

9) We do an all-AMPs SUM step in TD_Map1 to aggregate from Spool 1

     (Last Use) by way of an all-rows scan, grouping by field1 (30396).

     Aggregate Intermediate Results are computed globally, then placed

     in Spool 19. The size of Spool 19 is estimated with high

     confidence to be 1 row (19 bytes). The estimated time for this

     step is 0.23 seconds.

10) We do an all-AMPs RETRIEVE step in TD_Map1 from Spool 19 (Last

     Use) by way of an all-rows scan into Spool 2 (group_amps), which

     is built locally on the AMPs. The size of Spool 2 is estimated

     with high confidence to be 1 row (25 bytes). The estimated time

     for this step is 0.03 seconds.

11) Finally, we send out an END TRANSACTION step to all AMPs involved

     in processing the request.

 

The key benefit of the UNION ALL optimizations in Teradata Database 16.0 is the reduction of spool that results when aggregations and joins can be pushed to UNION branches and applied to each branch independently before the UNION ALL is applied. This can improve the performance of joins and speed up query execution time. 

 

The optimization is cost based and may not always be applied. The optimizer will cost the effort to perform the UNION ALL first, and compare it to the effort involved by pushing joins or aggregates into the tables that participate in the UNION ALL.  To get the most from this feature, make sure appropriate statistics have been collected.

 

2 Comments
Enthusiast
Parameter Query in Excel ODBC

Hi,

Can someone help me? 

I am trying to pass multiple SKUs through ODBC in exce with no luck, I can use cast (? As integer) to pass one UPC

I can put UPC’s in SQL statement but I would like to use parameter to grab all UPC’s from single cell

(2113035761,2113515774,21283100000,21233900000)

 

 

Please help

2Z

Teradata Employee

It would be better to post this question on its own as a separate topic in the Product Forums, rather than as a comment under an unrelated topic.

 

http://community.teradata.com/t5/Product-Forums/ct-p/Forums

 

Otherwise, those who might be able to answer it will be unlikely to see it.

 

Best regards, -Carrie