Inconsistent/unexpected results when putting OLAP functions in a view.

Database
Enthusiast

Inconsistent/unexpected results when putting OLAP functions in a view.

I am trying to create a view that returns aggregate results using OLAP functions. My intention is for the view to return the aggregates based on the data included in the where clause within the select statement that queries the view. However, what I am finding is that the aggregates are computed using all rows in the tables and not based on the Where clause in the Select statement. Here’s a quick test:
create table dev_eis_t.jlee80_agg_test
( id integer not null
, groupid char(1) not null
, selectionid char(1) not null
, qty integer not null
)
primary index (id)
;

insert into dev_eis_t.jlee80_agg_test
values (1,'A','A',100);
insert into dev_eis_t.jlee80_agg_test
values (2,'A','A',100);
insert into dev_eis_t.jlee80_agg_test
values (3,'A','A',100);
insert into dev_eis_t.jlee80_agg_test
values (4,'A','A',100);
insert into dev_eis_t.jlee80_agg_test
values (5,'A','B',10000);

insert into dev_eis_t.jlee80_agg_test
values (6,'B','A',100);
insert into dev_eis_t.jlee80_agg_test
values (7,'B','A',100);
insert into dev_eis_t.jlee80_agg_test
values (8,'B','A',100);
insert into dev_eis_t.jlee80_agg_test
values (9,'B','A',100);
insert into dev_eis_t.jlee80_agg_test
values (10,'B','B',10000);

replace view dev_eis.jlee80_agg_test
as
locking row for access
select id
, groupid
, selectionid
, average(qty) over (partition by groupid) as avg_qty
, count(*) over (partition by groupid) as rec_cnt
from dev_eis_t.jlee80_agg_test
;

-- Select directly from the tables using the same logic as the view
-- include all records
select id
, groupid
, selectionid
, average(qty) over (partition by groupid) as avg_qty
, count(*) over (partition by groupid) as rec_cnt
from dev_eis_t.jlee80_agg_test
;

id groupid selectionid avg_qty rec_cnt
1 2 A A 2080.00 5
2 4 A A 2080.00 5
3 5 A B 2080.00 5
4 3 A A 2080.00 5
5 1 A A 2080.00 5
6 6 B A 2080.00 5
7 9 B A 2080.00 5
8 10 B B 2080.00 5
9 7 B A 2080.00 5
10 8 B A 2080.00 5


-- Select directly from the tables using the same logic as the view
-- include a subset of records
-- This gives the answer I am looking for.
select id
, groupid
, selectionid
, average(qty) over (partition by groupid) as avg_qty
, count(*) over (partition by groupid) as rec_cnt
from dev_eis_t.jlee80_agg_test
where Selectionid = 'A'
;
id groupid selectionid avg_qty rec_cnt
1 3 A A 100.00 4
2 1 A A 100.00 4
3 4 A A 100.00 4
4 2 A A 100.00 4
5 8 B A 100.00 4
6 7 B A 100.00 4
7 9 B A 100.00 4
8 6 B A 100.00 4

-- Select from the view
-- include all records
-- This looks OK.

select *
from dev_eis.jlee80_agg_test
;
id groupid selectionid avg_qty rec_cnt
1 2 A A 2080.00 5
2 4 A A 2080.00 5
3 5 A B 2080.00 5
4 3 A A 2080.00 5
5 1 A A 2080.00 5
6 9 B A 2080.00 5
7 6 B A 2080.00 5
8 10 B B 2080.00 5
9 7 B A 2080.00 5
10 8 B A 2080.00 5

-- Select directly from the tables using the same logic as the view
-- include a subset of record
-- This give incorrect/unexpected results. I expect the count and
-- average to be computed only on the rows identified in the where
-- clause.
select *
from dev_eis.jlee80_agg_test
where Selectionid = 'A'
;
id groupid selectionid avg_qty rec_cnt
1 8 B A 2080.00 5
2 3 A A 2080.00 5
3 7 B A 2080.00 5
4 1 A A 2080.00 5
5 9 B A 2080.00 5
6 4 A A 2080.00 5
7 6 B A 2080.00 5
8 2 A A 2080.00 5

The Explain Plan shows that we are doing the aggregation before the filtering. Is this correct behavior or a bug?
Explain select *
from dev_eis.jl80_agg_test
where Selectionid = 'A'

1) First, we lock dev_eis_t.jlee80_agg_test in view
dev_eis.jl80_agg_test for access.
2) Next, we do an all-AMPs STAT FUNCTION step from
dev_eis_t.jlee80_agg_test in view dev_eis.jl80_agg_test by way of
an all-rows scan with no residual conditions into Spool 5 (Last
Use), which is redistributed by hash code to all AMPs. The result
rows are put into Spool 3 (all_amps), which is built locally on
the AMPs. The size is estimated with low confidence to be 72 rows
(2,664 bytes).
3) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of
an all-rows scan into Spool 1 (used to materialize view, derived
table or table function jl80_agg_test) (all_amps), which is built
locally on the AMPs. The size of Spool 1 is estimated with low
confidence to be 72 rows (2,952 bytes). The estimated time for
this step is 0.01 seconds.
4) We do an all-AMPs RETRIEVE step from Spool 1 (Last Use) by way of
an all-rows scan with a condition of ("jl80_agg_test.SELECTIONID =
'A'") into Spool 8 (group_amps), which is built locally on the
AMPs. The size of Spool 8 is estimated with low confidence to be
72 rows (2,952 bytes). The estimated time for this step is 0.01
seconds.
5) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 8 are sent back to the user as the result of
statement 1.

finally, if anyone knows of a way to accomplish what I am trying to do, please share!

Thanks,

Jimmy
Tags (1)