Help making this query more efficient

Database
Enthusiast

Help making this query more efficient

I'm running into the "No more spool space" error when trying to run a query. The tables themselves are quite small and I've rewritten some portions to make it more efficient.


Here's the query:

    with items_cte as
    (
    select i1.*
    from EDWVIEWS.Items as i1
    left join (
    select ItemNumber
    from EDWVIEWS.Items
    where ItemDescription like '%CRV%'
    ) as i2 on i1.ItemNumber = i2.ItemNumber
    where i2.ItemNumber is null
    )
    select c.CategoryCode, i.SmartBuyFlag, c.Category, i.BusinessItemFluid, i.ItemDescription, i.ItemNumber
    , i.Brand, i.PrivateLabelDate
    from
    EDWViews.Categories as c
    , items_cte as i
    , EDWViews.Companies as co
    , EDWViews.SubCategories as sc
    , EDWViews.Components as cp
    , EDWViews.Regions as r
    , EDWViews.Districts as d
    , stores_temp as s
    where co.CompanyNumber = r.CompanyNumber
    and cp.CatSubComp = i.CatSubComp
    and r.RegionCode = d.RegionGroup
    and d.DistrictCode = s.DistrictCode
    and c.CompanyCat = sc.CompanyCat
    and sc.CompanyCatSub = cp.CompanyCatSub
    and co.CompanyNumber in ( '010' )

 

And this is what I get when I run Explain. I'm sure that the error issue stems from step 10 where spools 10 and 11 are joined because 27 million + rows seems like way too many given that the largest table is about 70.000 rows.

 

This request is eligible for incremental planning and execution (IPE)
but does not meet cost thresholds. The following is the static plan
for the request.
1) First, we lock EDWtables.Companies in view EDWViews.Companies for
access, we lock EDWTables.Districts in view EDWViews.Districts for
access, we lock EDWtables.Regions in view EDWViews.Regions for
access, we lock EDWTables.DistrictPersonnel in view
EDWViews.Districts for access, we lock EDWtables.Items_ALL in view
items_cte.i for access, we lock EDWtables.Components in view
EDWViews.Components for access, we lock EDWtables.SubCategories in
view EDWViews.SubCategories for access, and we lock
EDWtables.Categories in view EDWViews.Categories for access.
2) Next, we do an all-AMPs RETRIEVE step from EDWTables.d in view
EDWViews.Districts by way of an all-rows scan with a condition of
("(EDWTables.d in view EDWViews.Districts.DistrictCode >= 100.
) OR (EDWTables.d in view EDWViews.Districts.DistrictCode <= 98.
)") into Spool 2 (all_amps), which is redistributed by the hash
code of (EDWTables.d.RegionGroup) to all AMPs. Then we do a SORT
to order Spool 2 by row hash. The size of Spool 2 is estimated
with low confidence to be 27 rows (405 bytes). The estimated time
for this step is 0.06 seconds.
3) We do an all-AMPs JOIN step from EDWtables.Regions in view
EDWViews.Regions by way of a RowHash match scan with a condition
of ("EDWtables.Regions in view EDWViews.Regions.CompanyNumber =
'010'"), which is joined to Spool 2 (Last Use) by way of a RowHash
match scan. EDWtables.Regions and Spool 2 are joined using a
merge join, with a join condition of (
"EDWtables.Regions.RegionCode = RegionGroup"). The result goes
into Spool 3 (all_amps), which is redistributed by the hash code
of (EDWtables.Regions.CompanyNumber) to all AMPs. Then we do a
SORT to order Spool 3 by row hash. The size of Spool 3 is
estimated with low confidence to be 21 rows (357 bytes). The
estimated time for this step is 0.10 seconds.
4) We execute the following steps in parallel.
1) We do an all-AMPs JOIN step from EDWtables.Companies in view
EDWViews.Companies by way of the unique primary index
"EDWtables.Companies in view EDWViews.Companies.CompanyNumber
= '010'" with no residual conditions, which is joined to
Spool 3 (Last Use) by way of a RowHash match scan.
EDWtables.Companies and Spool 3 are joined using a merge join,
with a join condition of ("EDWtables.Companies.CompanyNumber
= CompanyNumber"). The result goes into Spool 4 (all_amps),
which is duplicated on all AMPs. The size of Spool 4 is
estimated with low confidence to be 756 rows (10,584 bytes).
The estimated time for this step is 0.03 seconds.
2) We do an all-AMPs RETRIEVE step from EDWtables.Components in
view EDWViews.Components by way of an all-rows scan with a
condition of ("(EDWtables.Components in view
EDWViews.Components.CompanyNumber (FLOAT, FORMAT
'-9.99999999999999E-999'))= 1.00000000000000E 001") into
Spool 5 (all_amps), which is redistributed by the hash code
of (EDWtables.Components.CompanyCatSub) to all AMPs. Then we
do a SORT to order Spool 5 by row hash. The size of Spool 5
is estimated with low confidence to be 3,716 rows (78,036
bytes). The estimated time for this step is 0.06 seconds.
5) We do an all-AMPs JOIN step from Spool 4 (Last Use) by way of an
all-rows scan, which is joined to TDREAD.s by way of an all-rows
scan with a condition of ("(TDREAD.s.DistrictCode >= 100.
) OR (TDREAD.s.DistrictCode <= 98.)"). Spool 4 and TDREAD.s are
joined using a dynamic hash join, with a join condition of (
"DistrictCode = TDREAD.s.DistrictCode"). The result goes into
Spool 6 (all_amps), which is duplicated on all AMPs. The size of
Spool 6 is estimated with no confidence to be 12,024 rows. The
estimated time for this step is 0.11 seconds.
6) We do an all-AMPs JOIN step from EDWtables.SubCategories in view
EDWViews.SubCategories by way of a RowHash match scan with no
residual conditions, which is joined to Spool 5 (Last Use) by way
of a RowHash match scan. EDWtables.SubCategories and Spool 5 are
joined using a merge join, with a join condition of (
"EDWtables.SubCategories.CompanyCatSub = CompanyCatSub"). The
result goes into Spool 7 (all_amps), which is duplicated on all
AMPs. The size of Spool 7 is estimated with low confidence to be
69,372 rows (1,456,812 bytes). The estimated time for this step
is 0.13 seconds.
7) We do an all-AMPs JOIN step from Spool 7 (Last Use) by way of an
all-rows scan, which is joined to EDWtables.xi in view items_cte.i
by way of an all-rows scan with no residual conditions. Spool 7
and EDWtables.xi are joined using a dynamic hash join, with a join
condition of ("CatSubComp = EDWtables.xi.CatSubComp"). The result
goes into Spool 8 (all_amps), which is redistributed by the hash
code of (EDWtables.xi.ItemNumber) to all AMPs. Then we do a SORT
to order Spool 8 by row hash. The size of Spool 8 is estimated
with low confidence to be 81,554 rows (4,648,578 bytes). The
estimated time for this step is 0.16 seconds.
8) We do an all-AMPs RETRIEVE step from EDWtables.xi in view
items_cte.i by way of an all-rows scan with a condition of (
"EDWtables.xi in view items_cte.i.ItemDescription LIKE '%CRV%'")
into Spool 9 (all_amps), which is redistributed by the hash code
of (EDWtables.xi.ItemNumber) to all AMPs. Then we do a SORT to
order Spool 9 by row hash. The size of Spool 9 is estimated with
no confidence to be 71,075 rows (1,208,275 bytes). The estimated
time for this step is 0.12 seconds.
9) We execute the following steps in parallel.
1) We do an all-AMPs JOIN step from Spool 8 (Last Use) by way of
a RowHash match scan, which is joined to Spool 9 (Last Use)
by way of a RowHash match scan. Spool 8 and Spool 9 are
left outer joined using a merge join, with a join condition
of ("ItemNumber = ItemNumber"). The result goes into Spool
10 (all_amps), which is redistributed by the hash code of (
EDWtables.SubCategories.CompanyCat) to all AMPs. Then we do
a SORT to order Spool 10 by row hash. The size of Spool 10
is estimated with no confidence to be 81,555 rows (4,974,855
bytes). The estimated time for this step is 0.19 seconds.
2) We do an all-AMPs JOIN step from EDWtables.Categories in view
EDWViews.Categories by way of an all-rows scan with no
residual conditions, which is joined to Spool 6 (Last Use) by
way of an all-rows scan. EDWtables.Categories and Spool 6
are joined using a product join, with a join condition of (
"(1=1)"). The result goes into Spool 11 (all_amps), which is
built locally on the AMPs. Then we do a SORT to order Spool
11 by the hash code of (EDWtables.Categories.CompanyCat).
The size of Spool 11 is estimated with no confidence to be
12,024 rows (384,768 bytes). The estimated time for this
step is 0.10 seconds.
10) We do an all-AMPs JOIN step from Spool 10 (Last Use) by way of a
RowHash match scan, which is joined to Spool 11 (Last Use) by way
of a RowHash match scan. Spool 10 and Spool 11 are joined using a
merge join, with a join condition of ("(ItemNumber IS NULL) AND
(CompanyCat = CompanyCat)"). The result goes into Spool 1
(group_amps), which is built locally on the AMPs. The size of
Spool 1 is estimated with no confidence to be 27,239,370 rows (
2,070,192,120 bytes). The estimated time for this step is 11.01
seconds.
11) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 11.92 seconds.


Accepted Solutions
Teradata Employee

Re: Help making this query more efficient

You're not retrieving any columns from the Stores -> Districts -> Regions -> Companies hierarchy, so why are those even included in the FROM clause?

If you do need to include them, note that there is currently no condition that joins store with the Items -> Components -> SubCategories -> Categories hierarchy.

Number of stores times number of items could easily be 27 million.

1 ACCEPTED SOLUTION
3 REPLIES 3
Teradata Employee

Re: Help making this query more efficient

You're not retrieving any columns from the Stores -> Districts -> Regions -> Companies hierarchy, so why are those even included in the FROM clause?

If you do need to include them, note that there is currently no condition that joins store with the Items -> Components -> SubCategories -> Categories hierarchy.

Number of stores times number of items could easily be 27 million.

Ambassador

Re: Help making this query more efficient

Besides the missing join condition there's a datatype mismatch, Components.CompanyNumber is numeric, but Regions.CompanyNumber is a string.

 

And there might be some missing statistics, too.

Highlighted
Enthusiast

Re: Help making this query more efficient

You're right. I didn't have a join on the stores table! Thank you!