Database

turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

03-23-2016
07:43 PM

03-23-2016
07:43 PM

Dear TD Users,

I am trying to understand if there's an easy way (using queries) to **randomly **segregate data records in a given table into specific groups.

Basically, I am trying to assign each record a particular (pre-defined) value, at the same time I'd also want to ensure that all the records being selected are (almost) equally distributed in random order between the set of chosen grouping values.

For example: "Table A" has the following data with "Col_1" having unique values...

Col_1

001

002

003

004

005

006

007

008

009

010

I would like to assign each row from "Table A" to a specific group value. In this case, let's say that we choose to segregate them into 3 groups: group # 1, 2 & 3. Here's what I am trying to do...

- Randomly assign each record to a group - at any given instance, the number of groups will be known (in this case, its 3 groups).
- Ensure that the rows are almost equally distributed among the said # of groups - for ex: if there are 10 rows and 3 groups, then 4 (random) rows should be assigned to group #1, 3 to group #2 and 3 to group #3.
- Records should be unique across all groups i.e.) each group should be mutually exclusive and no row can be assigned more than one group # at the same time.

Expected Result:

Col_1 Group_Num

002 1

005 1

009 1

006 1

001 2

003 2

010 2

007 3

004 3

008 3

I am trying to come up with a query that is flexible enough to accept "n" value and then segregate the underlying records into 1..n groups equally in a random fashion. However, I am encountering challenges with the following:

- Generating a range of numbers when upper cap is known - ex: if the upper cap is 3, the group #s should be 1,2,3 and if the cap is 7, then the rows should automatically be distributed to group #s 1,2..7.
- Ensuring Randomness in assigning the values to records - ex: record "003" is assigned to group #1 when executing the query for the first time, but can be assigned to group #2 when running the query the second time.
- Ensuring mutual exclusivity between the assigned values - i.e. no row can be assigned to more than 1 value at the same time.

Any inputs/suggestions in this regard will be greatly appreciated.

Many Thanks!

2 REPLIES

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

03-24-2016
03:39 AM

03-24-2016
03:39 AM

If the number of groups is less than 16 you might apply the existing SAMPLE clause using a two step approach:

- calculate the exact sample sizes using
**count(*) / n**= sample size for each group (needs to be increased by one for**count(*) mod n**groups)

e.g. 1000 rows, 3 groups = 334,333,333 - SELECT SAMPLEID, t.* FROM TABLE SAMPLE 334,333,333

Using percentage without calculating numbers (like **SAMPLE 0.33,0.33,0.34**) might result in a (few) missing rows...

Otherwise you can utilize a ROW_NUMBER over RANDOM:

SELECT ....

(ROW_NUMBER() OVER (ORDER BY r) MOD 5) + 1

FROM

(

SELECT .....

RANDOM(-2147483648,2147483647) AS r

FROM table AS t

) AS dt

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

03-24-2016
05:11 PM

03-24-2016
05:11 PM

Thank you very much, Dieter!! Both your suggestions were spot on...

I actually didn't realize the SAMPLE clause can be leveraged in this case (I know...long way to go...:)) and was actually trying to work out a solution with RANDOM function and a manually generated range of values (based on # of groups).

Your solution makes total sense and thank you once again for making it so simple and straightforward.

Appreciate your time and inputs!

-PK

Copyright © 2004-2015 Teradata Corporation. Your use of this Teradata website is governed by the Privacy Policy and the Terms of Use, including your rights to materials on this website, the rights you grant to your submissions to this website, and your responsibilities regarding your conduct on this website.

The Privacy Policy and Terms of Use for this Teradata website changed effective September 8, 2016.