Conditional Distinct Count

Database
Enthusiast

Conditional Distinct Count

I need to do a conditional count on a number of assessments. It needs to be a distinct count over person and date. My raw data looks something like this:

Name Date Assessment Type Assessment Time
Chuck Smith 6/7/2010 1 6/7/10 1:00
Chuck Smith 6/7/2010 2 6/7/10 1:00
Chuck Smith 6/7/2010 3 6/7/10 1:00
Chuck Smith 6/8/2010 1 6/8/10 1:00
Chuck Smith 6/8/2010 1 6/8/10 2:00
Chuck Smith 6/8/2010 1 6/8/10 3:00
Chuck Smith 6/8/2010 2 6/8/10 1:00
Chuck Smith 6/8/2010 2 6/8/10 2:00
Chuck Smith 6/9/2010 2 6/9/10 3:00
Chuck Smith 6/9/2010 3 6/9/10 14:00
Larry Jones 6/7/2010 1 6/7/10 1:00
Larry Jones 6/7/2010 2 6/7/10 1:00
Larry Jones 6/8/2010 1 6/8/10 1:00
Larry Jones 6/8/2010 1 6/8/10 2:00
Larry Jones 6/8/2010 2 6/8/10 1:00
Larry Jones 6/9/2010 1 6/9/10 1:00
Larry Jones 6/9/2010 1 6/9/10 2:00
Larry Jones 6/9/2010 2 6/9/10 1:00
Larry Jones 6/9/2010 2 6/9/10 2:00
Larry Jones 6/10/20101 6/10/10 1:00
Larry Jones 6/11/20102 6/11/10 1:00

The result set should look something like the following:
Name Assessment Type 1 Assessment Type 2 Assessment Type 3
Chuck Smith 2 3 2
Larry Jones 4 4 0

If it didn't need to be a distinct count I could just use sum with a case statement, but this won't be distinct. I would like to use OLAP aggregates because I still need to bring back the detail data.

Thanks,
Kevin
2 REPLIES
Enthusiast

Re: Conditional Distinct Count

I had to do this in a couple of steps. First I create the unit by which I need to count. In this case it is a concatenation of Name, Assessment Date, and Assessment Type. Then I rank these values based on the assessment time. Then I can do a sum for each of the assessment types, but only of the records whose rank is 1. This is what makes the count distinct. The code looks like that below. It may not be exactly right because I was doing it on a different set of data with some different wrinkles, but the logic works. Note that I changed the column headings in the code to make it more clding friendly. This code returns all the detail data in addition to the aggregates.

-Kevin

[code]
SELECT
SUM(CASE WHEN Assessment_Type = 1 THEN 1 ELSE 0 END) OVER () AS Assessment_1_Count,
SUM(CASE WHEN Assessment_Type = 2 THEN 1 ELSE 0 END) OVER () AS Assessment_2_Count,
SUM(CASE WHEN Assessment_Type = 3 THEN 1 ELSE 0 END) OVER () AS Assessment_3_Count,
q1.name, q1.assessment_date, q1.assessment_type, q1.assessment_time
FROM (
SELECT
Name || CAST(assessment_date AS CHAR(10)) || CAST(assessment_type AS CHAR(2)) AS counting_unit,
RANK() OVER (PARTITION BY counting_unit ORDER BY assessment_time) AS assessment_rank,
table1.name, table1.assessment_date, table1.assessment_type, table1.assessment_time

FROM Table1
) q1
[/code]
Junior Contributor

Re: Conditional Distinct Count

This is probably the easiest solution, just some remarks:

Don't concat that counting_unit, this is unnecessary overhead.
And use ROW_NUMBER instead of RANK (unless you can guarantee there's no duplicate assessment_time.

ROW_NUMBER() OVER (PARTITION BY Name, assessment_date, assessment_type ORDER BY assessment_time)

And some parts of the outer SUMs are missing:
SUM(CASE WHEN Assessment_Type = 1 and assessment_rank = 1 THEN 1 ELSE 0 END)
OVER (PARTITION BY Name) AS Assessment_1_Count,

Dieter