Random Segregation of data records (dynamically assigning specific values to randomly selected rows)

Database

Random Segregation of data records (dynamically assigning specific values to randomly selected rows)

Dear TD Users,

I am trying to understand if there's an easy way (using queries) to randomly segregate data records in a given table into specific groups.

Basically, I am trying to assign each record a particular (pre-defined) value, at the same time I'd also want to ensure that all the records being selected are (almost) equally distributed in random order between the set of chosen grouping values. 

For example: "Table A" has the following data with "Col_1" having unique values...

Col_1
001
002
003
004
005
006
007
008
009
010

I would like to assign each row from "Table A" to a specific group value. In this case, let's say that we choose to segregate them into 3 groups: group # 1, 2 & 3. Here's what I am trying to do...

  1. Randomly assign each record to a group - at any given instance, the number of groups will be known (in this case, its 3 groups).  
  2. Ensure that the rows are almost equally distributed among the said # of groups - for ex: if there are 10 rows and 3 groups, then 4 (random) rows should be assigned to group #1, 3 to group #2 and 3 to group #3.
  3. Records should be unique across all groups i.e.) each group should be mutually exclusive and no row can be assigned more than one group # at the same time.

Expected Result:

Col_1 Group_Num
002 1
005 1
009 1
006 1
001 2
003 2
010 2
007 3
004 3
008 3

I am trying to come up with a query that is flexible enough to accept "n" value and then segregate the underlying records into 1..n groups equally in a random fashion. However, I am encountering challenges with the following: 

  • Generating a range of numbers when upper cap is known - ex: if the upper cap is 3, the group #s should be 1,2,3 and if the cap is 7, then the rows should automatically be distributed to group #s 1,2..7. 
  • Ensuring Randomness in assigning the values to records - ex: record "003" is assigned to group #1 when executing the query for the first time, but can be assigned to group #2 when running the query the second time.
  • Ensuring mutual exclusivity between the assigned values - i.e. no row can be assigned to more than 1 value at the same time. 

Any inputs/suggestions in this regard will be greatly appreciated.

Many Thanks!

Tags (1)
2 REPLIES
Senior Apprentice

Re: Random Segregation of data records (dynamically assigning specific values to randomly selected rows)

If the number of groups is less than 16 you might apply the existing SAMPLE clause using a two step approach:

  1. calculate the exact sample sizes using count(*) / n = sample size for each group (needs to be increased by one for count(*) mod n groups)

    e.g. 1000 rows, 3 groups = 334,333,333
  2. SELECT SAMPLEID, t.* FROM TABLE SAMPLE 334,333,333

Using percentage without calculating numbers (like SAMPLE 0.33,0.33,0.34) ​might result in a (few) missing rows...

Otherwise you can utilize a ROW_NUMBER over RANDOM:

SELECT ....
(ROW_NUMBER() OVER (ORDER BY r) MOD 5) + 1
FROM
(
SELECT .....
RANDOM(-2147483648,2147483647) AS r
FROM table AS t
) AS dt

Re: Random Segregation of data records (dynamically assigning specific values to randomly selected rows)

Thank you very much, Dieter!! Both your suggestions were spot on...

I actually didn't realize the SAMPLE clause can be leveraged in this case (I know...long way to go...:)) and was actually trying to work out a solution with RANDOM function and a manually generated range of values (based on # of groups).

Your solution makes total sense and thank you once again for making it so simple and straightforward. 

Appreciate your time and inputs!

-PK