Creating Group ID from SUM Limit

Database
Enthusiast

Creating Group ID from SUM Limit

I have a list of products and a count corresponding to the quantity sold in a single table. The data is laid out as such:

Product Name     QTY_SOLD

Mouse                       23

Keyboard                  25

Monitor                     56

TV                             10

Laptop                      45

...

I want to create a group ID where groups are broken up if the sum of the quantity sold is greater than 50.  We can order by Product Name to get an output similar to the following.

Product Name     QTY_SOLD     GROUP_NBR

Keyboard                 25                1

Laptop                     45                1

Monitor                    56                2

Mouse                     23                 3

TV                           10                3

I created a case statement to create the output I need but if I want to change the group id cutoff from 50 to say 100 or if i get more products and quantities I have to keep changing the case statement. Is there an easy way to use either recursion or some other method to accomodate this?

UPDATE main
FROM prod_list AS main,
 (
 SEL PROD_NAME
 , QTY_SOLD
 , SUM(QTY_SOLD) OVER (ORDER BY PROD_NAME ROWS UNBOUNDED PRECEDING) RUNNING
 FROM prod_list
 ) inr
SET GROUP_NBR =
 CASE
  WHEN RUNNING < 50 THEN 1
  WHEN RUNNING > 50 AND RUNNING < 100 THEN 2
  WHEN RUNNING > 100 AND RUNNING < 150 THEN 3
  WHEN RUNNING > 150 AND RUNNING < 200 THEN 4
  WHEN RUNNING > 200 AND RUNNING < 250 THEN 5
  ELSE 6
 END
WHERE main.PROD_NAME = inr.PROD_NAME
;
7 REPLIES
Enthusiast

Re: Creating Group ID from SUM Limit

I meant to say, I want to create a group ID where groups are created if the ROLLING sum of the quantity sold is greater than 50.

Enthusiast

Re: Creating Group ID from SUM Limit

I was thinking I may be able to get away with simply distributing the records evenly based on an average or other distribution key. If I had data such as below I would want to distribute them into buckets somewhat evenly.

PROD A - 50

PROD B - 55

PROD C - 1000

PROD D - 25

PROD E - 32

PROD F - 800

One possible result

GRP 1 - (PROD A & D)

GRP 2 - (PROD B & E)

GRP 3 - (PROD C)

GRP 4 - (PROD F)

Is that possible?

Teradata Employee

Re: Creating Group ID from SUM Limit

Change your running sum to order on QtySold. 

Join that result to SELECT SUM(QtySold) AS GrandTotal FROM Prod_List. 

Replace your CASE expression with (running/GrandTotal) DIV <#of buckets you want>. 

This distributes the products into a best approximation equal height histogram by their percentage contribution to the total quantity. If you have very large outliers then you might have missing buckets because the big one will cover multiple percentage buckets. 

Enthusiast

Re: Creating Group ID from SUM Limit

I tried to construct the query you recommended but when i ran it with over a hundred products it created almost as many buckets and didnt stick to the number of buckets i had wanted. I created another query using MOD and it gave me almost what i was looking for but didnt distribute as evenly as I had hoped for.

UPDATE main FROM prod_list AS main,  

 SEL
  PROD_NAME 
 ,  QTY_SOLD 
 ,  SUM(QTY_SOLD) OVER (ORDER BY NUMOF ROWS UNBOUNDED PRECEDING) RUNNING 
 FROM prod_list  
) inr
SET GROUP_NBR = RUNNING MOD <# of buckets>
WHERE main.PROD_NAME = inr.PROD_NAME ;
Teradata Employee

Re: Creating Group ID from SUM Limit

This is missing the join to and division by the grand total. That is necessary in order to make the running sum a percentage contribution.  And I left off the multiplication of that result by 100 before the division by the number of buckets.

In your sample you order by NUMOF? rather than QTY_SOLD.

Enthusiast

Re: Creating Group ID from SUM Limit

Is this the query you mean? I did the whole thing over so that I wouldnt have to do an insert and then an update. This still gives me some weird results what am I missing?

WITH PROD_CTE (PROD_NAME,QTY_SOLD) AS 
(
 SELECT
  PROD_NAME
 , COUNT(DISTINCT PROD_ID) AS QTY_SOLD
 FROM all_prods
 GROUP BY 1
)
SELECT
 PROD_NAME
, QTY_SOLD
, CUMLTV_SUM
, TTL
, (((RUNNING_SUM / TTL ) * 100  ) / 10) AS BUCKET
FROM (
SELECT
  PROD_NAME
, QTY_SOLD
, SUM(QTY_SOLD) OVER (ORDER BY QTY_SOLD ROWS UNBOUNDED PRECEDING) AS RUNNING_SUM
, TTL
FROM PROD_CTE
JOIN (
 SELECT SUM(QTY_SOLD) TTL
 FROM PROD_CTE 
 )sm ON 1=1
)v
;
Enthusiast

Re: Creating Group ID from SUM Limit

I ended up using a suggestion from the guys over at stack overflow.

First create a table that has the group id and the upper and lower limits of each bucket.

-- create the first entry for the recursive query
INSERT  TMP_WORK_DB.GRP_NBRS VALUES (0,1,0,2000000);

INSERT TMP_WORK_DB.GRP_NBRS (GRP_NBR,LOWER_LIMIT, UPPER_LIMIT)
WITH RECURSIVE GRP_RECRSV (GRP_NBR, LOWER_LIMIT, UPPER_LIMIT)
AS (
SELECT
    1 AS GRP_NBR
,   LOWER_LIMIT
,   UPPER_LIMIT
FROM TMP_WORK_DB.GRP_NBRS
UNION ALL
SELECT
    GRP_NBR + 1
,   LOWER_LIMIT + 2000000 -- set the interval to 2 million
,   UPPER_LIMIT + 2000000 -- can be adjusted as needed
FROM GRP_RECRSV
WHERE GRP_NBR < 120 -- needed a limit so that it would not be endless
)
SELECT * FROM GRP_RECRSV
;

Then use a simply WHERE BETWEEN on the data to determine which GROUP ID to assign.

-- delete the first entry because it was duplicated
DELETE FROM TMP_WORK_DB.GRP_NBRS WHERE GRP_NBR = 0;

-- set grp nbr using the limits table
INSERT TMP_WORK_DB.PROD_LIST_GRP
WITH NUMOFPRODS (PROD_NAME,QTY,RUNNING) AS
(
    SELECT
        PROD_NAME
    ,   COUNT(DISTINCT PROD_ID) AS QTY
    ,   SUM(QTY) OVER (ORDER BY QTY ROWS UNBOUNDED PRECEDING) RUNNING
    FROM TMP_WORK_DB.PROD_LIST
    GROUP BY 1
)
SELECT
    PROD_NAME
,   QTY
,   RUNNING
,   GRP_NBR
FROM NUMOFPRODS a
JOIN TMP_WORK_DB.GRP_NBRS b ON RUNNING BETWEEN LOWER_LIMIT AND UPPER_LIMIT
;