Database

turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

03-24-2015
04:35 PM

03-24-2015
04:35 PM

Hello,

I have a question concerning Randimized allocation:

I would like to split a customers' selection in two parts (50% and 50%) with the condition that the two parts has to contain the same number of customers which belongs to the same Abglkey 480, 481 and 482.

Example 600 customers --->100 have 480 as Abglkey, 200 have 481 as Abglkey, 300 have 482 as Abglkey. The results must be:

Table 1 Table 2

480 50 50

481 100 100

482 150 150

I wrote the following SQL

**ins** **into** BP_ATV.einsatz_115_S3

**sel** REFNR, **case** **when** REFNR **is** **not** **null** **then** ' ' **end** Gutscheinnummer , **case** **when** REFNR **is** **not** **null** **then** 'AKTIV3' **end** Kennzeichen

**from** BP_ATV.AKQ_CLEANS_TEST_115

**where** ABGLKEY **in** (480, 481, 482)

**randomized allocation.5 **

**order by Abglkey**

The problem is that order by is not allowed in subqueries. I also found the statement randomized allocation when... but it's not perfect and when I have 100 Abglkey it's not easy to do.

Do you have an idea how can I do that?

Thank you,

Valeroa

5 REPLIES

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

03-25-2015
04:07 AM

03-25-2015
04:07 AM

Hi Valeria,

RANDOMIZED ALLOCATION is an option of the SAMPLE clause, but sampling only works on all rows. Plus it's random, not repeatable, so you can't do two Inserts and still get distinct sets.

if the number of ABGLKESs is small and known (like in your example) you can do

select ..., SAMPLEID as sid

from BP_ATV.AKQ_CLEANS_TEST_115

where ABGLKEY in (480, 481, 482)

SAMPLE RANDOMIZED ALLOCATION

WHEN ABGLKEY = 480 THEN 0.5,0.5

WHEN ABGLKEY = 481 THEN 0.5,0.5

WHEN ABGLKEY = 482 THEN 0.5,0.5

END

This will result in two 50% samples for each ABGLKEY, numbered from 1 to 6.

You must materialize this result in a table and when you need to select a single sample filter using

WHERE sid MOD 1 = 0 (or 1)

For a larger number of ABGLKEYs (or unknown values) this will work:

SELECT ... ,

CASE WHEN PERCENT_RANK()

OVER (PARTITION BY ABGLKEY

ORDER BY rnd) < 0.5

THEN 0

ELSE 1

END AS sid

FROM

(

SELECT ... ,

RANDOM(-2000000000,2000000000) AS

from BP_ATV.AKQ_CLEANS_TEST_115

where ABGLKEY in (480, 481, 482)

) AS dt

This also creates two random groups each 50%.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

04-02-2015
10:18 AM

04-02-2015
10:18 AM

Hello Dieter,

thank you very much it works very well. The only thing that I noticed is that Teradata that in the end I have some customers more in the group 1 than in the group 0 because Teradata. This is due to Abglkey that contains odd numbers of customers. for eg:

Abglkey sid 1 sid 0

1 10 9

2 7 6

3 5 4

Is there a way that allow this 1 more in group 1 is assigned sometimes to 0 instead of to 1?

I hope it is clear. This is only for being more precise :) But it works very well you helped me a lot!

Thank you,

Valeria

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

04-02-2015
01:02 PM

04-02-2015
01:02 PM

Hi Valeria,

you could switch 0 and 1 like this (not random, but based on odd/even ABGLKEYs):

SELECT ... ,

(CASE WHEN PERCENT_RANK()

OVER (PARTITION BY ABGLKEY

ORDER BY rnd) < 0.5

THEN 0

ELSE 1

END + (ABGLKEY MOD 2)) MOD 2 AS sid

FROM

(

SELECT ... ,

RANDOM(-2000000000,2000000000) AS

from BP_ATV.AKQ_CLEANS_TEST_115

where ABGLKEY in (480, 481, 482)

) AS dt

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

09-12-2018
07:41 AM

09-12-2018
07:41 AM

Hello,

I have a question concerning this skript. I elaborated this skirpt and adaped it to an excel file that I uploaded in teradata:

sel U_KDNR, U_KEY, CANTONE, Ort, PLZ, Strasse, (CASE WHEN PERCENT_RANK()

OVER (PARTITION BY U_KEY

ORDER BY U_KEY, Ort ,CANTONE, PLZ, Strasse) < 0.5

THEN 0

ELSE 1

END + (U_KEY MOD 2)) MOD 2 AS sid

from (

sel a.U_KEY,

CANTONE,

a.U_KDNR,

dec_postadr(mandant_id,U_ORT) Ort,

dec_postadr(mandant_id,U_PLZ ) PLZ,

dec_postadr(mandant_id,U_STRASSE) Strasse,

mandant_id,

TEST_KONTRAST_CHF_M07_M09,

RANDOM(-2000000000,2000000000) as sid2

from sch.BASE_TEST_KONTRAST_CHF_SK10_CRIP a

join sch.PLZ_CANTONI b on dec_postadr(mandant_id, a.U_PLZ )= b.PLZ

where TEST_KONTRAST_CHF_M07_M09 is null

) AS dt;

This skript gives me as results a split between sid 1 and 0 but if I let it run twice or theree times it gives me the same results. It woulb be great if you could help me to modify this skript so that, if we let it run many times the split between 1 and 0 changes BUT the changes should be slight because the split should respect the split according to U_KEY, Ort ,CANTONE, PLZ, Strasse. Do you think it is possible?

Thank you and best regards,

Valeria

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

09-12-2018
04:33 PM

09-12-2018
04:33 PM

The only thing I see that can change the split is the addition or deletion of rows in sch.BASE_TEST_KONTRAST_CHF_SK10_CRIP. You use a Random function in the derived table but the number it generates ("sid2") is not used anywhere else in the query.

Also, by the way, it is wasteful and distracting to include U_KEY in the ORDER BY clause of the Percent_Rank() function because you are already PARTITIONing by it - all the rows in any partition will have the same U_KEY!