Tricky grouping question

Database
Enthusiast

Tricky grouping question

Hi,

I have a dataset similar to the following:

Patient_ID Unit Start_Time End_time
1234 3C 5/2/09 12:00 5/5/09 13:00
1234 3FE 5/5/09 13:00 5/5/09 14:00
1234 3FE 5/5/09 14:00 5/7/09 18:00
1234 ICU 5/7/09 18:00 5/16/09 5:00
1234 3C 5/16/09 5:00 5/23/09 7:00
1234 3C 5/23/09 7:00 5/25/09 3:00
1234 3C 5/25/09 3:00 5/28/09 8:00

The data shows the units in which a patient stayed while in the hospital. Start_Time is when they were moved into that unit and End_Time shows when they were moved out of the unit. However, there are also update events and so a patient can have multiple sequential records for the same unit. They can also be moved back into a unit in which they stayed previously. I would like to group this by Patient_ID and Unit, showing the min Start_Time and max End_Time for the time the patient is in each unit. This is necessary to combine the last three records to show just a single line summary for these two lines. What makes this challenging is that I still need to keep a previous time the patient was in unit 3C as separate line. The result set should look like the following:

1234 3C 5/2/09 12:00 5/5/09 13:00
1234 3FE 5/5/09 13:00 5/7/09 18:00
1234 ICU 5/7/09 18:00 5/16/09 5:00
1234 3C 5/16/09 5:00 5/28/09 8:00

Any suggestions would be appreciated.

-Kevin
6 REPLIES
Teradata Employee

Re: Tricky grouping question

One solution could be considering table name as test

select pat_id, unit , start_time , end_time from test y
,(select a.pat_id a1, a.unit a2, min(a.start_time) a3, max(a.end_time)a4 from test a , test b where
a.start_time=b.end_time group by a.pat_id , a.unit) x
where
y.pat_id = x.a1
and y.unit = x.a2
and y.start_time not between x.a3 and x.a4

UNION

select a.pat_id a1, a.unit a2, min(a.start_time) a3, max(a.end_time)a4 from test a , test b where
a.start_time=b.end_time group by a.pat_id , a.unit

Let me know whether the result of this query is as per your requirement

Thanks
Anand
Enthusiast

Re: Tricky grouping question

Hi Anand,

Thanks for your help. This worked for the dataset given above, but when I added the following records it grouped together all the 3C's after the first one.

1234 TCU 2009-05-28 08:00 2009-05-29 09:00
1234 3C 2009-05-29 09:00 2009-05-29 19:00

After inserting these rows and re-running the query you provided the results are:

1234 3C 2009-05-02 12:00 2009-05-05 13:00
1234 3FE 2009-05-05 13:00 2009-05-07 18:00
1234 ICU 2009-05-07 18:00 2009-05-16 05:00
1234 3C 2009-05-16 05:00 2009-05-29 19:00
1234 TCU 2009-05-28 08:00 2009-05-29 09:00

Instead what I need to see is:

1234 3C 2009-05-02 12:00 2009-05-05 13:00
1234 3FE 2009-05-05 13:00 2009-05-07 18:00
1234 ICU 2009-05-07 18:00 2009-05-16 05:00
1234 3C 2009-05-16 05:00 2009-05-28 08:00
1234 TCU 2009-05-28 08:00 2009-05-29 09:00
1234 3C 2009-05-29 09:00 2009-05-29 19:00

Thanks for your help.

-Kevin
Teradata Employee

Re: Tricky grouping question

Hi Kevin,

We have not consider the join between pat_id and unit in above query due to which records were overlapping.

Had rewritten the query with both joins please let us know the outcome for below query

select pat_id, unit , start_time , end_time from test y
,(select a.pat_id a1, a.unit a2, min(a.start_time) a3, max(a.end_time)a4 from test a , test b where
a.start_time = b.end_time
and a.pat_id = b.pat_id
and a.unit= b.unit
group by a.pat_id , a.unit) x
where
y.pat_id = x.a1
and y.unit = x.a2
and y.start_time not between x.a3 and x.a4

UNION

select a.pat_id a1, a.unit a2, min(a.start_time) a3, max(a.end_time)a4 from test a , test b where
a.start_time = b.end_time
and a.pat_id = b.pat_id
and a.unit = b.unit
group by a.pat_id , a.unit

Thanks
Anand
Junior Contributor

Re: Tricky grouping question

Hi Kevin,
this is a common problem:
creating distinct groups for the same value based on a sequenced data set.

And there's a common solution for it:

-- untested, because there were no DDL & INSERTs :-)
SELECT
Patient_ID,
Unit,
MIN(Start_Time),
MAX(End_time)
FROM
(
SELECT dt.*
,SUM(CASE WHEN Unit = prevUnit THEN 0 ELSE 1 END)
OVER (PARTITION BY Patient_ID
ORDER BY Start_Time ASC
ROWS UNBOUNDED PRECEDING) AS grp
FROM
(
SELECT
tab.*
,MIN(Unit)
OVER (PARTITION BY Patient_ID
ORDER BY Start_Time ASC
ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) AS prevUnit
FROM tab
) AS dt
) AS dt
GROUP BY grp, Patient_ID, Unit

Caution, this only works if there are no time gaps in the treatment of a person or if you don't care about gaps.
To solve that, the query will be more complicated (additional OLAP function or Common Table Expression with ROW_NUMBER).

Btw, TD13 introduces RESET WHEN syntax, which is simplifying above query.

Dieter
Enthusiast

Re: Tricky grouping question

Hi Dieter,

Beautiful! Works just like I need it to. I'm going to test it out some more and break ir down to make sure I understand what's going on in the query. Thanks for the help.

-Kevin
N/A

Re: Tricky grouping question

Helpful..