Need help with sorting some data

General
Enthusiast

Need help with sorting some data

Hi. I'm new to Teradata so I don't if there's an obvious solution to my problem.

I'm trying to sort a table in a specific way. This table has many columns, but I think the address column will be most helpful. Everytime I select the table using a where clause and a specific address, I get several rows. For example, one of the addresses on my table is 26 and I want to obtain just the 1st row that appears from this query:

Sel * from table_name
where address='26'

I want to do this for every distinct address on my original table and put them all together in a new table. Obviously, I can't use the above query for every address since there are too many of them, so I'm trying to see if there's a way to obtain all these 1st rows faster. Unfortunately, there are no unique identifiers on the table. I tried adding one using csum(1,1),
row_number() over (order by null), or
row_number() over (partition by s_addr order by null),
but they all end up scrambling up the order of the rows and I end up selecting the wrong ones.

If it helps, I'm trying to rewrite a query from SAS into SQL using this. With SAS, the query doesn't sort the rows in any particular order unless you tell it to, but with SQL, using any row_number function (even if you use order by null) will automatically sort the table with some of its rows. I think the key to this is adding a unique identifier to the table without sorting any rows, but I'm not sure how to do that.

Thank you in advance!
11 REPLIES 11
Teradata Employee

Re: Need help with sorting some data

Hi kenli90,

 

I think you almost figured it out.

Try this one :

 select *
   from table_name
qualify row_number() over (partition by address order by null) = 1;

qualify is a supplementary level of filter applied after window functions (like row_number).

the row_number is reseting for every new value of the partition by columns and expressions.

the order by null thus makes your query non-deterministic, which mean running it multiple times will yield in different outputs, which usually is something we don't want.

Enthusiast

Re: Need help with sorting some data

Thanks, Waldur. Unfortunately, your partition still ends up scrambling some of the rows. I'll give you an example. If I run the following:

Sel * from table_name
where address='26'

The 1st row I get has a tenure of 20 (tenure is another column in the table, it's not ordered in any way, so that's irrelevant). But when I use your query, I get a different tenure, so I know it gave me another row other than the 1st.

If it helps, I'm trying to convert the following SAS query to SQL:

Proc sort data = table_name nodupkey;
By col1 col2 col3 col4 col5 col6

The columns listed here are all columns of the table, bit none of them are the address column that I mentioned.
Teradata Employee

Re: Need help with sorting some data

Yes, this is due to the non-deterministic related to the order by null in the row_number.

 

I'm not familiar with SAS, so I'll go for a guess. Try this one :

 select *
   from table_name
qualify row_number() over (partition by address order by col1, col2, col3, col4, col5, col6) = 1;
Enthusiast

Re: Need help with sorting some data

I'm still having the same problem. The partition function is still rearranging some of the rows even though it shouldn't. From what I understand, the SAS query I posted orders the table by the 6 columns and then takes the 1st row from this ordered table that has a unique set of the 6 columns. I thought the equivalent should be

Qualify row_number() over (partition by col1, col2, col3, col4, col5, col6 order by
col1, col2, col3, col4, col5, col6)=1

But this doesn't work. I think the main problem is that when ordering the rows with SAS, it only orders the rows by the columns listed. But in Teradata, while it does order the rows by the listed columns as a priority, it will use the sort the rows by the other columns in some way, even if they are not listed in the query.
Teradata Employee

Re: Need help with sorting some data

Can you publish some of the data in your base table and after ordered in SAS?

You can anonimize what needed as long as you keep the logic.

 

Enthusiast

Re: Need help with sorting some data

Sure, here are the results of the select query from above:

col1 col2 col3 col4 col5 col6 address tenure
7.3 11 11 A AAB 1245 20 9999
7.3 11 11 A AAB 1245 20 32

In the SAS query, I would get the 1st row, the one with tenure 9999. But when using any partitions in Teradata, I always get the 2nd row instead.
Enthusiast

Re: Need help with sorting some data

I forgot to add that the tenure shouldn't be ordered in any way. If you replace 9999 with 0 in the 1st row, the SAS query will still select the 1st row.
Teradata Employee

Re: Need help with sorting some data

There is an important concept in relational databases you need to understand, data are not sorted.

See tables like a bag of marbles. You can put things in any order you want, once in the bag you can't know which one you put first.

 

If I do insert into my table values ('A') then insert into my table values ('B'), you can never say A is the first value and B the second.

What you can do is add a column that will allow you to sort your table.

 

To sum it up, there is no row order unless you specify something.

 

SAS has to make a choice somewhere to choose one row over the others.

Enthusiast

Re: Need help with sorting some data

I did try adding an extra column using csum or row_number, but I don't get the exact same result as the SAS query.

I think in SAS, the data is actually stored like an excel spreadsheet (there's only about 60000 rows here, so its would be easy to store this), so it would be able to take the 1st row of for each address more easily.

I have found that when I use the following:

Sel top 1 a.* from table_name a
where address='20'

I do end up getting the row that I want. Is there a way to do some sort of loop that would execute this for every address and then union all of them together?

By the way, thanks again for the help.