Selecting all rows that have a unique column

Database
N/A

Selecting all rows that have a unique column

I have raw data coming into a table for an ETL job, so I'd like to keep all the records.

I'm trying to select all the rows that basically have a unique ID. This can obviously be resolved by a primary key in a set table, but I tried doing so in a volatile table to no avail.

So I've been trying to do something along the lines of a

SELECT Distinct id, name, email
FROM source_table

but due to Teradata SQL, the statement will return all rows unique based on the combination of id, name and email. I instead am trying to find all rows unique based on just ID and select all the columns.

How would I accomplish this? I've tried a GROUP BY, but that seems more difficult to do what I'm trying to achieve here.

14 REPLIES

Re: Selecting all rows that have a unique column

Can you show sample records, what you have and what you want? That will be easier. From my standpoint, it looks like group by is what you resort to using it, since you need all records. Why VT does not work? Do you use on commit preserve row?

Teradata Employee

Re: Selecting all rows that have a unique column

Hi Brim,

if you want to find rows with unique ID, the following SQL applies.  The window analytic functions are great for such purposes.

select id, name, email

from source_table

qualify count(*) over (partition by id) = 1

;

Alternatively, you can use the unfiltered query to evaluate the data:

select id, name, email, count(*) over (partition by id)  as number_of_rows_with_this_ID

from source_table

;

Regards,

Vlad.

N/A

Re: Selecting all rows that have a unique column

Vlad,

The partition function is almost what I need. Because of the `= 1`, the statement is filtering out the partitions or IDs that have a count of greater than 1. I still want those rows, but just 1 of them.

So in an example table like the following:

ID |  NAME

1  Aaron

1  Aaron

2  Jason

2  Jason

2  Jason

3 Brian

I want the following returned:

1  Aaron

2  Jason

3  Brian

Vlad's partition statement is giving me just:

3 Brian

N/A

Re: Selecting all rows that have a unique column

You need to use a kind of ROW_NUMBER instead of a group COUNT:

select id, name, email
from source_table
qualify row_number() over (partition by id order by rows unbounded preceding) = 1

But for your example you might simply utilize DISTINCT

select DISTINCT id, name, email
from source_table
N/A

Re: Selecting all rows that have a unique column

Hi Dieter,

From what I've learned and verified, DISTINCT will filter based on all the columns in the SELECT clause. So in the little code snippet you wrote out, the query would return all rows that have a distinct combination of id, name and email. Teradata won't return based on just distinct id.

For the first part where you used row_number, is there something I'm supposed to add in? I can't find the right syntax for a QUALIFY clause

N/A

Re: Selecting all rows that have a unique column

Ups, sorry, I wrote nonsense, mixing ROW_NUMBER and COUNT:

select id, name, email
from source_table
qualify count(*) over (partition by id rows unbounded preceding) = 1
N/A

Re: Selecting all rows that have a unique column

Awesome, that's exactly what I need. Thanks!

Re: Selecting all rows that have a unique column

Hi Experts,

 

can we convert the below query into a normal subquery  ? if yes means please help me with that query.

 

select a.*,b.*

--, row_number() over (partition by a.deptid,a.name order by sal) as r

from emp a, dept b

where a.deptid=b.deptid

qualify row_number() over (partition by a.deptid,a.name order by sal desc) =1

 

input table : emp table

 

empid

name

deptid

sal

1

a

10

30,000

1

b

10

30,000

1

a

10

30,000

1

a

10

30,000

1

a

10

40,000

2

a

10

30,000

2

a

10

40,000

3

c

10

80,000

3

c

20

90,000

3

c

10

80,000

3

c

10

80,000

4

b

20

30,000


depttable:

deptid

deptname

10

hrc

20

td

 

select a.*, b.*, row_number() over (partition by  a.deptid,a.name order by a.sal desc) as r from emp_capgem a, dept_capgem  b

where a.deptid=b.deptid 

qualify r =1

 

or 

select a.*,b.*

--, row_number() over (partition by a.deptid,a.name order by sal) as r 

 from  emp_capgem a, dept_capgem b

where a.deptid=b.deptid

qualify row_number() over (partition by a.deptid,a.name order by sal desc) =1

 

result:

empid

name

deptid

sal

deptid

deptname

r

1

a

10

40,000

10

hrc

1

1

b

10

30,000

10

hrc

1

3

c

10

80,000

10

hrc

1

4

b

20

30,000

20

td

1

3

c

20

90,000

20

td

1

 

N/A

Re: Selecting all rows that have a unique column

Please define normal subquery.

Replacing the proprietary QUALIFY?

select *
from
   select a.*, b.*,
      row_number() over (partition by  a.deptid,a.name order by a.sal desc) as r
   from emp_capgem a join dept_capgem  b
   on a.deptid=b.deptid 
) as dt
where r =1

Additionally I switched to JOIN syntax, you really shouldn't use old joins anymore.