I am attempting to extract Control Groups in our Automated Batch. The table in the batch I hope to extract from contains a Communication_ID and a Customer_id. Each night the no of COMMUNICATION_IDs and CUSTOMER_IDs will vary. How can I write some dynamic SQL that will select a random sample (10%) of customers per COMMUNICATION_ID and can I set a MIN or MAX on this also?
I am using SQL Assistant V13..
Any help would be greatly appreciated.
Is the number of COMMUNICATION_IDs known and small:
WHEN COMMUNICATION_ID = x1 THEN 0.1
WHEN COMMUNICATION_ID = x2 THEN 0.1
WHEN COMMUNICATION_ID = x3 THEN 0.1
WHEN COMMUNICATION_ID = x4 THEN 0.1
This could be created dynamically based on the actual data, but i don't know if there's a limit for the number of WHENs.
Or use "WHERE RANDOM(1,10) = 1", but this is not exactly 10% and might miss an id with a small number of rows (this will also happen with SAMLPE when the number of rows is less than 5).
What do you mean by "set MIN and MAX"?
Hi Dieter....thanks for the reply. The number of CommunicationIds will not be known. It too will vary each evening.
What I mean by Max and Min is: if there are only say 10 customers for one communication then I may not want to extract 10% (as the number of customers selected is so small). I would like to select a % between a Min and a Maximum ie select 10% but no less than 10 and no more than 100....unsure how to go about doing this...
How large is this table?
OLAP functions are more expensive than RANDOM or SAMPLE but you could do it like this:
SELECT * FROM tab
ROW_NUMBER() OVER (PARTITION BY communication_id ORDER BY HASHROW(xxx)) <= 10 -- at least 10 rows
OR (ROW_NUMBER() OVER (PARTITION BY communication_id ORDER BY HASHROW(xxx)) <=
COUNT(*) OVER (PARTITION BY communication_id) / 10 -- 10 percent of the rows
AND ROW_NUMBER() OVER (PARTITION BY communication_id ORDER BY HASHROW(xxx)) <= 100) -- maximum of 100 rows
As RANDOM can't be used within an OLAP function xxx must be a column which is (close to) unique.
Instead of HASHROW(xxx) you could simply use "1", but this will probably result in skewed spool.