Solving a problem without using an OLAP function

Database
Enthusiast

Solving a problem without using an OLAP function

Hi,

I have a SQL problem that I have solved using an OLAP function but I am wondering if there is a better way to do it - more elegant with potentially better performance. Here is the essence of the problem. Each person must take a test once per day over a specified period of time. They may take the test more than once per day, but must take the test at least once. During the course of the day they can also move around to different rooms. The data would look something like this for one person:

Person_Name Location Test_Date Test_Time
Smith Rm. A 10/21/2010 10/21/2010 13:00
Smith Rm. A 10/22/2010 10/22/2010 12:00
Smith Rm. B 10/22/2010 NULL
Smith Rm. B 10/23/2010 10/23/2010 10:00
Smith Rm. C 10/23/2010 10/23/2010 12:00
Smith Rm. D 10/23/2010 NULL

I want to return all the tests that were taken even there are more than one in a day. However, I don't want to return a record for a room if there was no test given in that location but there was a test given on that day (10/23/2010 in the example above). However, if there was no test given on a day, I still want to return a record for that day. Based on the sample above what I want to return is:

Person_Name Location Test_Date Test_Time
Smith Rm. A 10/21/2010 10/21/2010 13:00
Smith Rm. A 10/22/2010 10/22/2010 12:00
Smith Rm. B 10/22/2010 NULL
Smith Rm. B 10/23/2010 10/23/2010 10:00
Smith Rm. C 10/23/2010 10/23/2010 12:00

I solved this by ranking over the person_name and test_date and ordering descending on test_time. The code looked sometihng like this:

SELECT person_name, location, test_date, test_time,
RANK () OVER (PARTITION BY person_name, test_date ORDER BY test_time DESC) AS rec_rank
from table_1
QUALIFY NOT (test_time IS NULL AND rec_rank > 1);

This works but it just seems like there should be a simpler way. OLAP functions can be performance intensive and I am dealing with millions of records in the real world. Any ideas?

Thanks,
Kevin
Tags (2)
2 REPLIES
Enthusiast

Re: Solving a problem without using an OLAP function

Oops! I should have proof-read this more clearly. The second record in both recordsets above should be excluded. I meant to have only one record from 10/22 and for that record to have a NULL test_time.
Senior Apprentice

Re: Solving a problem without using an OLAP function

Hi Kevin,
it's still confusing.

You want all rows where test_time is not null plus the row where there's only a NULL for a (person_name,test_date) combination, is that correct?

You could write a traditional query with NOT EXISTS like:

SELECT person_name, location, test_date, test_time
FROM table_1 AS t1
WHERE test_time IS NOT NULL

UNION ALL

SELECT person_name, location, test_date, test_time
FROM table_1 AS t1
WHERE test_time IS NULL
AND
NOT EXISTS
(SELECT * FROM table_1 AS t2
WHERE t1.person_name = t2.person_name AND t1.test_date = t2.test_date
AND t2.test_time IS NOT NULL
)

or replace the second query with:

SELECT person_name, MIN(location), test_date, MIN(test_time)
FROM table_1 AS t1
GROUP BY person_name, test_date
HAVING MIN(test_time) IS NULL

If this is more efficient than the RANK depends on different factors:
- number of rows per (person_name, test_date)
- number of NULL rows
- Primary Index definition of table_1

Using RANK the performance will be almost independent of those factors.

So check the explain of the different versions and then run them and check the QueryLog for the actual resource usage.

Dieter