Sample without duplicates

Database
Enthusiast

Sample without duplicates

Hi,

I have a table which has records as follows

EmpNo DepNo
1 10
2 10
3 20
4 30
5 20
6 40

I want to select a random sample of 3 records from the table , however i should not select a department number more than once.
Example
EmpNo DepNo
1 10
2 10
3 20
In the above case i should have selected DeptNo 10 only once and the results should have been
EmpNo DepNo
1 10
3 20
4 30

Sam

12 REPLIES
Senior Apprentice

Re: Sample without duplicates

select *
from tab
qualify rank() over (partition by deptno order by 0) = 1 -- one random row per dept
sample 3

Dieter
Enthusiast

Re: Sample without duplicates

Hi Dieter,

This solution you gave does not work ok.

I tried it and got these results which is not ok

EMPNO DEPNO
4 30
3 20
5 20

Deptno 20 is getting repeated.

Sam
Enthusiast

Re: Sample without duplicates

Use this instead

select *
from test
qualify rank() over (partition by deptno order by deptno) = 1 -- one random row per dept
sample 3

Thanks,
Vinay
Enthusiast

Re: Sample without duplicates

I think the trouble is because RANK() returns the same value for same deptno (in this case it's always 1).

So I think piggy backing on Dieter's original solution and using ROW_NUMBER instead, things might work.

SELECT *
FROM MYTABLE
QUALIFY ROW_NUMBER() OVER (partition by deptno order by 0) = 1
SAMPLE 3;

Senior Apprentice

Re: Sample without duplicates

Hi Joe,
of course you're right, i tested it with a row_number, but i couldn't sent an email from that PC, so i keyed it in again with a RANK ;-)

Dieter
Enthusiast

Re: Sample without duplicates

Thanks a lot guys!That did work perfectly as required.

Sorry for the late reply

Sam
Enthusiast

Re: Sample without duplicates

Suppose when I run
select *
from test
qualify rank() over (partition by deptno order by deptno) = 1

where my table looks like

deptno x y
10 1 2
10 2 3
10 3 4
20 1 2
20 2 3
30 1 2

I am getting all the records from my table. Is this scenario correct? If yes what should be done to get unique deptno records by using partition and not unique command.
Enthusiast

Re: Sample without duplicates

if your partitioning and ordering expressions are the same, then your rank output will always be 1.

you can use row_number() instead of rank as mentioned in the above posts.
Enthusiast

Re: Sample without duplicates

WOW
It worked Joed. Thanks a lot.

Can you brief me what made it work? ie diff btw ROW_NUMBER and RANK in this case. And in what criteria it selects the records when deptno being the same (as we are partitioning by deptno).