Database

turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

03-23-2016
07:43 PM

03-23-2016
07:43 PM

Dear TD Users,

I am trying to understand if there's an easy way (using queries) to **randomly **segregate data records in a given table into specific groups.

Basically, I am trying to assign each record a particular (pre-defined) value, at the same time I'd also want to ensure that all the records being selected are (almost) equally distributed in random order between the set of chosen grouping values.

For example: "Table A" has the following data with "Col_1" having unique values...

Col_1

001

002

003

004

005

006

007

008

009

010

I would like to assign each row from "Table A" to a specific group value. In this case, let's say that we choose to segregate them into 3 groups: group # 1, 2 & 3. Here's what I am trying to do...

- Randomly assign each record to a group - at any given instance, the number of groups will be known (in this case, its 3 groups).
- Ensure that the rows are almost equally distributed among the said # of groups - for ex: if there are 10 rows and 3 groups, then 4 (random) rows should be assigned to group #1, 3 to group #2 and 3 to group #3.
- Records should be unique across all groups i.e.) each group should be mutually exclusive and no row can be assigned more than one group # at the same time.

Expected Result:

Col_1 Group_Num

002 1

005 1

009 1

006 1

001 2

003 2

010 2

007 3

004 3

008 3

I am trying to come up with a query that is flexible enough to accept "n" value and then segregate the underlying records into 1..n groups equally in a random fashion. However, I am encountering challenges with the following:

- Generating a range of numbers when upper cap is known - ex: if the upper cap is 3, the group #s should be 1,2,3 and if the cap is 7, then the rows should automatically be distributed to group #s 1,2..7.
- Ensuring Randomness in assigning the values to records - ex: record "003" is assigned to group #1 when executing the query for the first time, but can be assigned to group #2 when running the query the second time.
- Ensuring mutual exclusivity between the assigned values - i.e. no row can be assigned to more than 1 value at the same time.

Any inputs/suggestions in this regard will be greatly appreciated.

Many Thanks!

2 REPLIES

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

03-24-2016
03:39 AM

03-24-2016
03:39 AM

If the number of groups is less than 16 you might apply the existing SAMPLE clause using a two step approach:

- calculate the exact sample sizes using
**count(*) / n**= sample size for each group (needs to be increased by one for**count(*) mod n**groups)

e.g. 1000 rows, 3 groups = 334,333,333 - SELECT SAMPLEID, t.* FROM TABLE SAMPLE 334,333,333

Using percentage without calculating numbers (like **SAMPLE 0.33,0.33,0.34**) might result in a (few) missing rows...

Otherwise you can utilize a ROW_NUMBER over RANDOM:

SELECT ....

(ROW_NUMBER() OVER (ORDER BY r) MOD 5) + 1

FROM

(

SELECT .....

RANDOM(-2147483648,2147483647) AS r

FROM table AS t

) AS dt

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

03-24-2016
05:11 PM

03-24-2016
05:11 PM

Thank you very much, Dieter!! Both your suggestions were spot on...

I actually didn't realize the SAMPLE clause can be leveraged in this case (I know...long way to go...:)) and was actually trying to work out a solution with RANDOM function and a manually generated range of values (based on # of groups).

Your solution makes total sense and thank you once again for making it so simple and straightforward.

Appreciate your time and inputs!

-PK