Teradata 12 and Subqueries

Database

Teradata 12 and Subqueries

I have a query, that runs just fine in Oracle and SAS, but, it doesn't want to execute in Teradata. I understand that there's a 'Scalar Subquery' ability in TD13, but in the near term we're still on TD12.

As I'm in the process of porting all of our existing analytic work to the new Teradata environment, getting a little help with this (and other similar queries) would be greatly appreciated.


Select component_location_id, component_type_code,
( select clv.LOCATION_VALUE from stg_dev.component_location_values clv where identifier_code = 'AXLE' and component_location_id = cl.component_location_id ) as AXLE,
( select clv.LOCATION_VALUE from stg_dev.component_location_values clv where identifier_code = 'SIDE' and component_location_id = cl.component_location_id ) as SIDE,
( select clv.LOCATION_VALUE from stg_dev.component_location_values clv where identifier_code = 'TRUCK' and component_location_id = cl.component_location_id ) as TRUCK
from stg_dev.component_locations cl


The above code returns a '3706' error, stating it is expecting 'something' between the '(' and the 'Select'. I initially thought it had to do with TD needing each result to be named, as in the following:


Select component_location_id, component_type_code,
( select clv.LOCATION_VALUE from stg_dev.component_location_values clv where identifier_code = 'AXLE' and component_location_id = cl.component_location_id as tmp1) as AXLE,
( select clv.LOCATION_VALUE from stg_dev.component_location_values clv where identifier_code = 'SIDE' and component_location_id = cl.component_location_id as tmp2) as SIDE,
( select clv.LOCATION_VALUE from stg_dev.component_location_values clv where identifier_code = 'TRUCK' and component_location_id = cl.component_location_id as tmp3) as TRUCK
from stg_dev.component_locations cl


However, that results in the same error...

Any ideas, tricks that might help me move this forward?

Thanks,

-Bill
9 REPLIES
Enthusiast

Re: Teradata 12 and Subqueries

Bill - You should just be able to add those subqueries as derrived tables in the FROM clause, something like

Select component_location_id,
component_type_code,
ax.location_value as axle,
si.location_value as side,
tr.location_value as truck
from stg_dev.component_locations cl
left join ( select component_location_id, LOCATION_VALUE
from stg_dev.component_location_values clv
where identifier_code = 'AXLE') ax
on ax.component_location_id = cl.component_location_id
left join ( select component_location_id, LOCATION_VALUE
from stg_dev.component_location_values clv
where identifier_code = 'SIDE') si
on si.component_location_id = cl.component_location_id
left join ( select component_location_id, LOCATION_VALUE
from stg_dev.component_location_values clv
where identifier_code = 'TRUCK') tr
on tr.component_location_id = cl.component_location_id

Depending on the indexes you have, what other criteria is passed to the query and the relative sizes of the tables, you could also try joining the tables and use case statements and group by to reduce to a single row for each component location id and type code. Also, depending on your data, you might be able to replace the outer joins with inner.

Re: Teradata 12 and Subqueries

That's the ticket Dave.... I was hoping to avoid having to do that... I guess we'll have to until we instal TD13 though.

I appreciate you taking a shot at this for me.
Senior Apprentice

Re: Teradata 12 and Subqueries

In most cases a Scalar Subquery might be rewritten as a Left Outer Join.
But your query is even easier (untested):

SELECT component_location_id, component_type_code,
AXLE,
SIDE,
TRUCK
FROM stg_dev.component_locations cl LEFT JOIN
(
SELECT
component_location_id,
MIN(CASE WHEN identifier_code = 'AXLE' THEN clv.LOCATION_VALUE END) AS AXLE,
MIN(CASE WHEN identifier_code = 'SIDE' THEN clv.LOCATION_VALUE END) AS SIDE
MIN(CASE WHEN identifier_code = 'TRUCK' THEN clv.LOCATION_VALUE END) AS TRUCK
FROM stg_dev.component_location_values
WHERE identifier_code IN ('AXLE','SIDE','TRUCK')
GROUP BY 1
) AS clv

Dieter
ON cl.component_location_id = clv.component_location_id

Re: Teradata 12 and Subqueries

how can we divide the sal column divided by its own sum?

Re: Teradata 12 and Subqueries

Hey,

I am inserting data from Multiple table using various joins and conditions into Another table, using Group By condition.

Now I want to match the SUM of Amount field for Source and Target table.

Pls suggest.

Thanks

Senior Apprentice

Re: Teradata 12 and Subqueries

Please add some more details.

HAVING sum(source.amount) = sum(target.amount)?

Dieter

Enthusiast

Re: Teradata 12 and Subqueries

dnoeth,

We are on TD 13.10. Can you please go through Query 1 and Query 2 and answer my questions below? Thanks in advance!!

Query 1: (Scalar )

------------------------------------------

sel a1,b1, (select    sum(c2)from    t2  where     a2=a1  ) as totalamount

from    t1 ;

Query 2: ( Re-written the query by removing Scalar query ).

-------------------------------------------------------------------------

select a1,b1, sum(c2) from t1 , t2 where a1=a2 group by 1,2;

Query plans for above two queries are totally different. Though both queries should yield the same results in every scenario, I did not understand why Query1 is joining t1 and t2 two time. Can the optimizer not re-write the query1 as query 2 and execute?

Query 1 Plan:

Explain sel a1,b1, (

select    sum(c2)

from    t2  

where     a2=a1  ) as totalamount

from    t1 ;

  1) First, we lock a distinct PERF."pseudo table" for read on a

     RowHash to prevent global deadlock for PERF.t2.

  2) Next, we lock a distinct PERF."pseudo table" for read on a RowHash

     to prevent global deadlock for PERF.t1.

  3) We lock PERF.t2 for read, and we lock PERF.t1 for read.

  4) We do an all-AMPs JOIN step (Global sum) from PERF.t2 by way of a

     RowHash match scan, which is joined to PERF.t1 by way of a RowHash

     match scan.  PERF.t2 and PERF.t1 are joined using a merge join,

     with a join condition of ("PERF.t2.a2 = PERF.t1.a1").  The result

     goes into Spool 4 (all_amps), which is built locally on the AMPs

     with Field1 ("-5438").  The size of Spool 4 is estimated with low

     confidence to be 192 rows (4,608 bytes).  Spool Asgnlist: "-5438",

     "Spool_4.Field_2" = "{ Copy }{RightTable}.ROWID",

     "Spool_4.c2" = "{ Copy }{LeftTable}.c2".

     The estimated time for this step is 0.04 seconds.

  5) We do an all-AMPs SUM step to aggregate from Spool 4 (Last Use) by

     way of an all-rows scan, and the grouping identifier in field 2.

     Aggregate Intermediate Results are computed globally, then placed

     in Spool 5.  The size of Spool 5 is estimated with low confidence

     to be 144 rows (3,888 bytes).  The estimated time for this step is

     0.04 seconds.

  6) We execute the following steps in parallel.

       1) We do an all-AMPs RETRIEVE step from Spool 5 (Last Use) by

          way of an all-rows scan into Spool 1 (all_amps), which is

          built locally on the AMPs with hash fields ("Spool_5.Field_2")

          and Field1 ("Spool_5.Field_2").  The size of Spool 1 is

          estimated with low confidence to be 144 rows (4,896 bytes).

          Spool Asgnlist:

          "Field_1" = "Spool_5.Field_2",

          "Field_2" = "Field_3",

          "Field_3" = "Field_2".

          The estimated time for this step is 0.03 seconds.

       2) We do an all-AMPs RETRIEVE step from PERF.t1 by way of an

          all-rows scan with no residual conditions into Spool 7

          (all_amps), which is redistributed by hash code to all AMPs

          with hash fields ("PERF.t1.ROWID") and Field1 (

          "PERF.t1.ROWID").  Then we do a SORT to order Spool 7 by row

          hash.  The size of Spool 7 is estimated with low confidence

          to be 144 rows (3,744 bytes).  Spool Asgnlist:

          "Field_1" = "PERF.t1.ROWID",

          "a1" = "a1",

          "b1" = "b1".

          The estimated time for this step is 0.01 seconds.

  7) We do an all-AMPs RETRIEVE step from Spool 1 (Last Use) by way of

     an all-rows scan into Spool 8 (all_amps), which is redistributed

     by hash code to all AMPs with hash fields ("Spool_1.Field_3") and

     Field1 ("Spool_1.Field_1").  Then we do a SORT to order Spool 8 by

     row hash.  The size of Spool 8 is estimated with low confidence to

     be 144 rows (4,896 bytes).  Spool Asgnlist:

     "Field_1" = "Spool_1.Field_1",

     "Field_2" = "Field_2",

     "Field_3" = "Spool_1.Field_3".

     The estimated time for this step is 0.01 seconds.

  8) We do an all-AMPs JOIN step (No Sum) from Spool 7 (Last Use) by

     way of a RowHash match scan, which is joined to Spool 8 (Last Use)

     by way of a RowHash match scan.  Spool 7 and Spool 8 are

     left outer joined using a merge join, with a join condition of (

     "Spool_7.Field_1 = Spool_8.Field_3").  The result goes into Spool

     2 (group_amps), which is built locally on the AMPs with Field1 (

     "UniqueId").  The size of Spool 2 is estimated with low confidence

     to be 144 rows (4,752 bytes).  Spool Asgnlist: "UniqueId",

     "{LeftTable}.a1 ,{LeftTable}.b1 ,{RightTable}.Field_2,".

     The estimated time for this step is 0.04 seconds.

  9) Finally, we send out an END TRANSACTION step to all AMPs involved

     in processing the request.

  -> The contents of Spool 2 are sent back to the user as the result of

     statement 1.  The total estimated time is 0.18 seconds.

====

The real issue we have on our system is that in Step 4, it assigns some constant value to Field 1. In Step 5, when it does the SUM by gouping on Field 1, the estimates are going down from 5M rows to 5 rows. Because of this, in sub-sequent steps, the optimizer is going for product joins :(

I opened incident with Teradata, but would like to know your thoughts on this.  You show up on google when I search for scalar queries in Teradata. So, I am assuming that you may be one of the scalar query feature developers.

===

Senior Apprentice

Re: Teradata 12 and Subqueries

I'm not a Teradata developer, but all DBMSes have severe problems rewriting Correlated Scalar Subqueries, that's why i always try to avoid them :-)

In your case i would a 3rd query, it's the easiest way to convert a scalar aggregate to a join:

Query 3:
sel a1,b1, totalamount
from t1 left join
(select a2, sum(c2) as totalamount
from t2
group by 1
) as dt
on a2=a1;

Btw, your Q2 should use an Outer Join, too, otherwise the result might be different.

Enthusiast

Re: Teradata 12 and Subqueries

Thank you very much Dnoeth for your quick response. 

In my case, Set {a1 } = Set {a2 } .  Hence I converted LOJ to Inner Join.

Going forward, we will preach our app teams to avoid Corelated SSQs :)