COUNT () OVER (PARTITION BY) and DISTINCT

Database
Enthusiast

COUNT () OVER (PARTITION BY) and DISTINCT

Hello !

I create this topic, because I have a question concerning the COUNT(My field) OVER (PARTITION BY My Other Field) and the DISTINCT statement.

Here is the situation : I have in one table the following lines :

Name;Hair Colour;Timestamp

Ted;brown;t1

Ted;red;t2

Ted;brown;t3

Ted;blue;t4

Ben;blond;t5

Ben;green;t6

Ben;purple;t7

Ben;green;t8

I would like to obtain the following result :

Name;colour;Total Nb of distinct hair colours per name

Ted;brown;3

Ted;red;3

Ted;blue;3

Ben;blond;3

Ben;green;3

Ben;purple;3

I tried this :

SELECT

    name

    , hair_colour

    , COUNT(hair_colour) OVER (PARTITION BY name)

FROM

    MyTable;

But it gives me the following result :

Name;colour;Total Nb of distinct hair colours per name

Ted;brown;4

Ted;red;4

Ted;blue;4

Ben;blond;4

Ben;green;4

Ben;purple;4

I think maybe it could be a good idea to use of DISTINCT in the count over, but unfortunately, this is not permitted.

I'm new in Teradata, so I just hope this question does not look completely stupid :D.

Best regards.

Gwen

14 REPLIES
Junior Contributor

Re: COUNT () OVER (PARTITION BY) and DISTINCT

Hi Gwen,

SELECT DISTINCT is processed after the OPAP function and COUNT(DISTINCT)  can't be used in OLAP, but in this case you don't need it, just GROUP BY first:

SELECT
name
, hair_colour
, COUNT(hair_colour) OVER (PARTITION BY name)
FROM
MyTable
GROUP BY
name
, hair_colour;

Dieter

Re: COUNT () OVER (PARTITION BY) and DISTINCT

Hi Dieter, 

Just curious to know - if this is possible:

I am in the same situation where I need to use DISTINCT in a OLAP function - but being not able to.

COL_1 COL_2 COL_3 COL_4 COL_5 COL_6 COL_7 COL_8 COL_9 COL_10 COL_11 COL_12
A B1 C1 D1 E1 F1 G1 H1 I1 J1 2 DATE1
A B1 C1 D2 E2 F2 G2 H2 I1 J1 2 DATE1
A B1 C1 D3 E3 F3 G3 H3 I1 J1 2 DATE1
A B1 C1 D4 E4 F4 G4 H4 I1 J1 2 DATE1
A B1 C1 D5 E5 F5 G5 H5 I1 J1 2 DATE1
A B1 C1 D6 E6 F6 G6 H6 I1 J1 2 DATE1
A B1 C1 D7 E7 F7 G7 H7 I1 J1 2 DATE1
A B1 C1 D8 E8 F8 G8 H8 I1 J1 2 DATE1
A B2 C2 D9 E9 F9 G9 H9 I2 J1 2 DATE1
A B2 C2 D10 E10 F10 G10 H10 I2 J1 2 DATE1
A B2 C2 D11 E11 F11 G11 H11 I2 J1 2 DATE1
A B2 C2 D12 E12 F12 G12 H12 I2 J1 2 DATE1
A B2 C2 D13 E13 F13 G13 H13 I2 J1 2 DATE1
A B2 C2 D14 E14 F14 G14 H14 I2 J1 2 DATE1
A B2 C2 D15 E15 F15 G15 H15 I2 J1 2 DATE1
A B2 C2 D16 E16 F16 G16 H16 I2 J1 2 DATE1

I need the COL_11 count as distinct count of COL_2. I am supposed to use the partition by for COL_1 and COL_12.

SELECT COL_1, COL_2, COL_3, COL_4, COL_5, COL_6, COL_7, COL_8, COL_9, COL_1,
COUNT( DISTINCT COL_2) over(partition by COL_1, COL_12 ) as NO_COL_2_VAL,
COL_12
FROM TBL

This  query fails - and I have to find an alternative way to acieve this.

Can anybody please help?

Thanks

Lakshminarasu Chenduri

Junior Contributor

Re: COUNT () OVER (PARTITION BY) and DISTINCT

Hi Lakshminarasu,

as DISTINCT is not allowed you need two nested OLAP functions:

SELECT ...
SUM(flag)
OVER (PARTITION BY COL_1, COL_12)
FROM
(
SELECT ....
-- tag only one of the duplicate values, the first in a group
-- could be ROW_NUMBER, too, but this needs ORDER BY
CASE WHEN MIN(COL_2)
OVER (PARTITION BY COL_1, COL_12, COL_2 -- includes the DISTINCT col
ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) IS NULL
THEN 1
ELSE 0
END AS flag
FROM TBL
) AS dt

Re: COUNT () OVER (PARTITION BY) and DISTINCT

Thank you very much Dieter !!!

Lakshminarasu

Enthusiast

Re: COUNT () OVER (PARTITION BY) and DISTINCT

Hello Folks,  

I have a requirement and should bring d data in below format

(col1) (Date1) (Date2) (Sum for days1-10.) (Sum days11-20) (sumdays21-30)    Total

A.        Xx.     Vvv.          532                                                                             532

B.      Cc          Cv.                                           453                                            453

c.       Ft.         As.                                                                     543.                543           

From d above example the date1 and date 2 are date Columns and it's being compared and difference is found as number of days . i.e. if date1>date 2 then date1-date2 else date2-date1 which would giv d number of days difference.

 

Now based on this number of days difference calculation the SUM is calculated group by on column 1 and will come under the respective days bucket, either (Sum for days1-10.)or (Sum days11-20) or  (sumdays21-30).

 

And the last column is the SUM of that specific row.

 

Hope I made my requirement clear, now how do I handle this at the Teradata query Level, Please help.

Thanks in Advance.

Best Regards,

Shavyani :)

Enthusiast

Re: COUNT () OVER (PARTITION BY) and DISTINCT

Same reply as the other one...

Enthusiast

Re: COUNT () OVER (PARTITION BY) and DISTINCT

Hi All ,

In the same example as above , I will have to calculate the column wise "SUM" as mentioned in 'Bold" in the example below.

(col1) (Date1) (Date2) (Sum for days1-10.) (Sum days11-20) (sumdays21-30)    Total

SUM                                533                          458                    543                1543

A.        Xx.     Vvv.          532                                                                             532

B.      Cc          Cv.                                           453                                            453

c.       Ft.         As.                                                                     543                  543   

D       hh          xx              1                                                                              1        

E        ff           mm                                             5                                             5         

I want this SUM to be as the first row in the entire data set , 

I think of something like below that's in bold.... will this work? or is there any other function or workaround that I could use to calculate this , Please help 

SELECT

CAST(NULL AS VARCHAR(20)) AS CUST,

CAST(NULL AS DATE FORMAT 'DD-MMM'YYYY') AS DT1,

CAST(NULL AS DATE FORMAT 'DD-MMM'YYYY') AS DT2,

SUM(SUM_1TO10) , SUM(SUM_11TO20),SUM(SUM_21TO30),SUM(SUM_TOT ) 

UNION

SEL 

CUST, DT1, DT2,

CASE   WHEN DIFF >=1 AND DIFF <=10 THEN DIFF ELSE 0 END AS SUM_1TO10,

CASE   WHEN DIFF >=11 AND DIFF <=20 THEN DIFF ELSE 0 END AS SUM_11TO20,

CASE   WHEN DIFF >=21 AND DIFF <=30 THEN DIFF ELSE 0 END AS SUM_21TO30,

CASE   WHEN DIFF <1 OR  DIFF >30 THEN DIFF ELSE 0 END AS SUM_DEF,

SUM_1TO10+SUM_11TO20+SUM_21TO30+SUM_DEF AS SUM_TOT FROM 

(SEL   CUST, DT1, DT2,

SUM(

CASE 

WHEN DT1 > DT2 THEN DT1-DT2

WHEN DT2 > DT1 THEN DT2-DT1

ELSE 0

END) OVER (PARTITION BY CUST ORDER BY 1 ) AS DIFF

FROM 

(

SEL * FROM   (SEL 'A' AS CUST, DATE '2001-01-01' AS DT1, DATE '2001-01-11' AS DT2) A

UNION ALL

SEL * FROM   (SEL 'B' AS CUST, DATE '2001-01-21' AS DT1, DATE '2001-01-01' AS DT2) B

UNION ALL

SEL * FROM   (SEL   'C' AS CUST, DATE '2001-01-01' AS DT1, DATE '2001-01-31' AS DT2) C

UNION ALL

SEL * FROM   (SEL 'A' AS CUST, DATE '2001-02-21' AS DT1, DATE '2001-01-11' AS DT2) A

) A

) A

Regards,

Shavyani

Teradata Employee

Re: COUNT () OVER (PARTITION BY) and DISTINCT

Hi shavyani

I'm not sure if I understand you. Is something like this what you need?

select cust, dt1, dt2, abs(DT1-DT2) as diff,
case when diff <= 10 then diff else 0 end AS SUM_1TO10,
case when diff > 10 and diff <= 20 then diff else 0 end AS SUM_11TO20,
case when diff > 20 and diff <= 30 then diff else 0 end AS SUM_21TO30,
case when diff > 30 then diff else 0 end AS SUM_DEF,
SUM_1TO10+SUM_11TO20+SUM_21TO30+SUM_DEF AS SUM_TOT,
sum(SUM_1TO10) over (ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS SUM_1TO10_WF,
sum(SUM_11TO20) over (ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS SUM_11TO20_WF,
sum(SUM_21TO30) over (ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS SUM_21TO30_WF,
sum(SUM_DEF) over (ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS SUM_DEF_WF,
sum(SUM_TOT) over (ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS SUM_TOT_WF
FROM
(
SEL * FROM (SEL 'A' AS CUST, DATE '2001-01-01' AS DT1, DATE '2001-01-11' AS DT2) A
UNION ALL
SEL * FROM (SEL 'B' AS CUST, DATE '2001-01-21' AS DT1, DATE '2001-01-01' AS DT2) B
UNION ALL
SEL * FROM (SEL 'C' AS CUST, DATE '2001-01-01' AS DT1, DATE '2001-01-31' AS DT2) C
UNION ALL
SEL * FROM (SEL 'A' AS CUST, DATE '2001-02-21' AS DT1, DATE '2001-01-11' AS DT2) A
UNION ALL
SEL * FROM (SEL 'E' AS CUST, DATE '2001-02-21' AS DT1, DATE '2001-01-31' AS DT2) A
) A

The you get this screenshot:


Enthusiast

Re: COUNT () OVER (PARTITION BY) and DISTINCT

That should work...shavyani....