Looking for help with regards to nested aggregates

General

Looking for help with regards to nested aggregates

Hi all,

I am new to Teradata and I am having trouble building a query, would appreciate all the help I can get thanks! I'm looking to identify the highest difference in value between people entering and exiting a rail station, Location A (over the course of a day) and finding out the average of this value (daily) over all weekdays in a certain month (e.g. October).

I have this example table below where it shows the timestamp, where their journey originates from (as they tap into the station) and where their destination is (as they tap out). They all have unique IDs based on their card used to tap in.

Date Time_IN ORIG_LOC DEST_LOC CRD_NUM
01/10/2015 8:40 Location A Location B 1234567
01/10/2015 8:40 Location A Location B 390480
01/10/2015 8:41 Location A Location C 9133244
01/10/2015 8:42 Location A Location D 1239293
01/10/2015 8:43 Location D Location A 31323552
01/10/2015 8:44 Location F Location A 12145353
01/10/2015 8:45 Location A Location B 35778655
01/10/2015 8:46 Location C Location A 3930305

So at 8:45, there have been 5 people who entered the station and 2 people who exited the station (difference in value of 3). At 8:46, 5 people entered and 3 people exited (difference in value of 2)., so on and so forth. What I want to find out is over the course of the entire day 01/10/2015, what is the highest difference in value between entries and exits, and obtaining this value for all weekdays in October and finding out the average. 

All help appreciated, thanks!! :)

11 REPLIES
Senior Apprentice

Re: Looking for help with regards to nested aggregates

London Oyster Card?  :-)

You need another nesting level for each nested aggregation: avg(max(count)))

select acg(maxdiff)
from
(
select date, max(diff) as maxdiff
from
(
select date, time_in,
sum(case when DEST_LOC = 'Location A' then 1 else 0 end) -- exit
- sum(case when ORIC_LOC = 'Location A' then 1 else 0 end) -- enter
AS diff
from tab
where (ORIC_LOC = 'Location A' or DEST_LOC = 'Location A')
and date between DATE '2015-10-01' and DATE '2015-10-30' -- October 2015
group by date, time_in
) as dt
) as dt

Re: Looking for help with regards to nested aggregates

Hi dnoeth, thanks for the help! I realised that my column header 'Date' is actually a reserved word, hence the coding might run into a few errors. Also, I realised that I forgot to include another column into my header which shows the exit timings for people exiting from Location A, which is needed instead of using the entry time (at their different locations).

Here's the updated table below:

B_DT Time_Out ORIG_LOC Time_Out DEST_LOC CRD_NUM
01/10/2015 8:40 Location A 8:59 Location B 1234567
01/10/2015 8:40 Location A 9:02 Location B 390480
01/10/2015 8:41 Location A 9:05 Location C 9133244
01/10/2015 8:42 Location A 9:20 Location D 1239293
01/10/2015 8:43 Location D 9:25 Location A 31323552
01/10/2015 8:44 Location F 9:11 Location A 12145353
01/10/2015 8:45 Location A 9:01 Location B 35778655
01/10/2015 8:46 Location C 9:02 Location A 3930305

Also, just out of curiosity, when 'case' is being used, it sort of acts like a scenario and can be used as a substitute for the usual sum(count())? This is because i realised that in your code, you didn't use sum(count(crd_num)) at all, which I thought would have been required to calculate the maxdiff. 

Appreciate your help, thanks!! :)

Senior Apprentice

Re: Looking for help with regards to nested aggregates

So this is the base data, no pre-calculation, yet?

You need another Derived Table to combine the times for enter and exit:

SELECT AVG(maxdiff)
FROM
(
SELECT B_DT, MAX(diff) AS maxdiff
FROM
(
SELECT B_DT, time_in, SUM(out_) - SUM(in_) AS diff
FROM
( -- combine both enter and exit
SELECT B_DT, time_in, 1 AS in_, 0 AS out_
FROM tab
WHERE ORIC_LOC = 'Location A'
AND B_DT BETWEEN DATE '2015-10-01' AND DATE '2015-10-30'
UNION ALL
SELECT B_DT, time_out, 0 AS in_, 1 AS out_
FROM tab
WHERE DEST_LOC = 'Location A'
AND B_DT BETWEEN DATE '2015-10-01' AND DATE '2015-10-30'
) AS dt
) AS dt
GROUP BY B_DT
) AS dt

An aggregate over a CASE is called conditional aggregation, it's like different WHERE-conditions within the same query.

And the SUM(CASE) emulates a COUNT, summing ones is like counting.

Re: Looking for help with regards to nested aggregates

Hmm just curious, why the need to insert the UNION ALL command? Having this command would stack the out_ and in_ data into a single table? Would it be easier if both datasets were separated into 2 different tables and then tabulated accordingly based on b_dt? 

I am also getting an error message of "selected non-aggregated values should be included in the associated group" when i try to run this script. Am I supposed to include time_in and time_out in the GROUP BY command?

Thanks!

Enthusiast

Re: Looking for help with regards to nested aggregates

Hello, I need help as well on a case for nested aggregation as well.In this sample data, I need to find avg of field D over C and then sum them up over A in one query. Can it be done without sub query or derived table? 

A B C D

C1 R1 I1 1

C1 R1 I2 10

C1 R1 I3 2

C1 R1 I1 2

C1 R2 I3 3

C2 R3 I4 2

C2 R3 I3 4

C2 R3 I4 4

C2 R4 I6 5

C2 R4 I2 15

C3 R5 I1 3

C3 R5 I5 4

C3 R6 I6 7

C3 R6 I1 1.5

C3 R6 I6 10

C3 R6 I1 2.5

 

Result

 

Group by A COUNT DISTINCT B COUNT DISTINCT C SUM(AVG(D) PARTITION BY(A,C))

C1 2 3 14

C2 2 4 27

C3 2 3 14.83333333

Enthusiast

Re: Looking for help with regards to nested aggregates

Alternatively, if we have to do a sub query, is that a way we can achieve a result set as below - 

Find Avg within (A,C) and populate only for the first occurance of an I<> in a C<> and then sum it up in the outer query?

A B C D AVG(D) PARTITION BY(A,C)

C1 R1 I1 1 1.5

C1 R1 I2 10 10

C1 R1 I3 2 2.5

C1 R1 I1 2

C1 R2 I3 3

C2 R3 I4 2 3

C2 R3 I3 4 4

C2 R3 I4 4

C2 R4 I6 5 5

C2 R4 I2 15 15

C3 R5 I1 3 2.333333333

C3 R5 I5 4 4

C3 R6 I6 7 8.5

C3 R6 I1 1.5

C3 R6 I6 10

C3 R6 I1 2.5

Group by A COUNT DISTINCT B COUNT DISTINCT C SUM

C1 2 3 14

C2 2 4 27

C3 2 3 14.83333333

Enthusiast

Re: Looking for help with regards to nested aggregates

Hi Dieter... Any help on the above would be helpful....

Enthusiast

Re: Looking for help with regards to nested aggregates

Is this what you are expecting?

SEL A,COUNT(DISTINCT B),COUNT(DISTINCT C),SUM (SM1) FROM
(SEL A,B,C,
CAST(AVG(D) OVER (PARTITION BY A,C ORDER BY A ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
AS DECIMAL(2,1)) AS SM1
FROM TABLE
) A
GROUP BY 1
Enthusiast

Re: Looking for help with regards to nested aggregates

Hello.. Cant have a sub query.. actually this needs to be executed on a reporting solution and it cant allow sub queries. Also,  when you we do AVG(D) OVER (PARTITION BY A,C ORDER BY A ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING

it would cause the avg to be across all the records and when summed up it would give wrong results.