Propagating column data for null values from previous rows until change in new value.

Database
Enthusiast

Propagating column data for null values from previous rows until change in new value.

Hello All,

 

I  need urgent help with a scinario,please do suggest me with some approach to solve.

I have data as below where i get nulls if there is no part produced for a certain period of tieme (15 min window).

DateQhourPartnoSeriesECIIDID
201701010    
2017010115    
201701013031083000000745519796152177328749
2017010145    
2017010110031083000000745519871953777624849
20170101115    
2017010113031083000000745519796152177328749
20170101145    
20170101200    
2017010121531083000000745519871953777624849
20170101230    
20170101245    

 

now i need to populate the null values with the exact values from the row above this row  and in the staart of the day there might be no activity of production happen and for the first two instance as well need the vales fetched form the row 3 as below

DateQhourPartnoSeriesECIIDID
20170101031083000000745519796152177328749
201701011531083000000745519796152177328749
201701013031083000000745519796152177328749
201701014531083000000745519796152177328749
2017010110031083000000745519871953777624849
2017010111531083000000745519871953777624849
2017010113031083000000745519796152177328749
2017010114531083000000745519796152177328749
2017010120031083000000745519796152177328749
2017010121531083000000745519871953777624849
2017010123031083000000745519871953777624849
2017010124531083000000745519871953777624849

 

Please suggest me on how to achive this scinario its kind of urgent 

thanks for the oputunity 

 

 


Accepted Solutions
Senior Supporter

Re: Propagating column data for null values from previous rows until change in new value.

There are differnt ways to achive this - which might also show different performance caracteristics.

The below solution might be one of the best to understand.

/* first generate some test data */

CREATE VOLATILE TABLE vt_a AS
(
SELECT '2017-03-01' ( DATE ) AS ref_dt,
day_of _calendar AS id,
CASE WHEN random(0, 3) = 0 THEN random(1000000, 2000000) ELSE NULL END AS val1,
CASE WHEN val1 IS NULL THEN NULL ELSE 49 END AS val2
FROM sys_calendar.calendar
WHERE day_of _calendar BETWEEN 1 AND 20
) WITH DATA UNIQUE PRIMARY INDEX (id) ON COMMIT PRESERVE ROWS;

/* calculate a grp identifier and a row number and store the results in a VT table */
CREATE VOLATILE TABLE vt_b
AS
(
SELECT ref_dt,
id,
val1,
val2,
/*one grp id for all records belonging together (exception if fist row has null this makes up a seperate grp which need to be handled separetly*/
sum( CASE WHEN val1 IS NULL THEN 0 ELSE 1 END ) OVER ( ORDER BY ref_dt, id ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS grp,
row_number() OVER ( ORDER BY ref_dt, id) AS row_nr
FROM vt_a
) WITH data UNIQUE PRIMARY INDEX (ref_dt, id) ON COMMIT PRESERVE ROWS;

/* get the required info with three different selects which are union all */

-- get the info from the previous not null row (of the same group) for null rows
SELECT
  b2.ref_dt,
  b2.id,
  b1.val1,
  b1.val2
FROM vt_b b1
  LEFT OUTER JOIN
  vt_b b2
    ON b1.grp = b2.grp
WHERE b1.val1 IS NOT NULL
      AND b2.val1 IS NULL
UNION ALL
-- get the info for the not null row in the grp
SELECT
  b1.ref_dt,
  b1.id,
  b1.val1,
  b1.val2
FROM vt_b b1
  LEFT OUTER JOIN
  vt_b b2
    ON b1.grp = b2.grp
WHERE b1.val1 IS NOT NULL
      AND b2.val1 IS NOT NULL
UNION ALL
-- special case first row has null (get the not null value of grp 1
SELECT
  b1.ref_dt,
  b1.id,
  b2.val1,
  b2.val2
FROM vt_b b1
  JOIN
  vt_b b2
    ON b1.grp = 0
       AND b2.grp = 1
       AND b2.val1 IS NOT NULL
WHERE 1 = (SELECT min(row_nr)
           FROM vt_b
           WHERE row_nr = 1 AND val1 IS NULL)
ORDER BY 2;

 

 

1 ACCEPTED SOLUTION
5 REPLIES
Senior Supporter

Re: Propagating column data for null values from previous rows until change in new value.

There are differnt ways to achive this - which might also show different performance caracteristics.

The below solution might be one of the best to understand.

/* first generate some test data */

CREATE VOLATILE TABLE vt_a AS
(
SELECT '2017-03-01' ( DATE ) AS ref_dt,
day_of _calendar AS id,
CASE WHEN random(0, 3) = 0 THEN random(1000000, 2000000) ELSE NULL END AS val1,
CASE WHEN val1 IS NULL THEN NULL ELSE 49 END AS val2
FROM sys_calendar.calendar
WHERE day_of _calendar BETWEEN 1 AND 20
) WITH DATA UNIQUE PRIMARY INDEX (id) ON COMMIT PRESERVE ROWS;

/* calculate a grp identifier and a row number and store the results in a VT table */
CREATE VOLATILE TABLE vt_b
AS
(
SELECT ref_dt,
id,
val1,
val2,
/*one grp id for all records belonging together (exception if fist row has null this makes up a seperate grp which need to be handled separetly*/
sum( CASE WHEN val1 IS NULL THEN 0 ELSE 1 END ) OVER ( ORDER BY ref_dt, id ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS grp,
row_number() OVER ( ORDER BY ref_dt, id) AS row_nr
FROM vt_a
) WITH data UNIQUE PRIMARY INDEX (ref_dt, id) ON COMMIT PRESERVE ROWS;

/* get the required info with three different selects which are union all */

-- get the info from the previous not null row (of the same group) for null rows
SELECT
  b2.ref_dt,
  b2.id,
  b1.val1,
  b1.val2
FROM vt_b b1
  LEFT OUTER JOIN
  vt_b b2
    ON b1.grp = b2.grp
WHERE b1.val1 IS NOT NULL
      AND b2.val1 IS NULL
UNION ALL
-- get the info for the not null row in the grp
SELECT
  b1.ref_dt,
  b1.id,
  b1.val1,
  b1.val2
FROM vt_b b1
  LEFT OUTER JOIN
  vt_b b2
    ON b1.grp = b2.grp
WHERE b1.val1 IS NOT NULL
      AND b2.val1 IS NOT NULL
UNION ALL
-- special case first row has null (get the not null value of grp 1
SELECT
  b1.ref_dt,
  b1.id,
  b2.val1,
  b2.val2
FROM vt_b b1
  JOIN
  vt_b b2
    ON b1.grp = 0
       AND b2.grp = 1
       AND b2.val1 IS NOT NULL
WHERE 1 = (SELECT min(row_nr)
           FROM vt_b
           WHERE row_nr = 1 AND val1 IS NULL)
ORDER BY 2;

 

 

Junior Contributor

Re: Propagating column data for null values from previous rows until change in new value.

If you don't need the first NULLs populated it's a simple LAST_VALUE:

SELECT ref_dt,
   id,
   Last_Value(val1 IGNORE NULLS) Over (/*partition by ???*/ ORDER BY ref_dt, id  ) AS val1,
   Last_Value(val2 IGNORE NULLS) Over ( ORDER BY ref_dt, id  ) AS val2
FROM vt_a

I really love the IGNORE NULLS option :-)

 

 

But if you actually need those values you must apply multiple steps.

 

Ulrich's answer might be simplified to 

WITH cte AS 
(   SELECT ref_dt,
      id,
      val1,
      val2,
      /*one grp id for all records belonging together, first NULL rows will be assigned 1 */
      Coalesce(Sum(CASE WHEN val1 IS NOT NULL THEN 1 END) 
               Over (ORDER BY ref_dt, id
                     ROWS BETWEEN Unbounded Preceding AND CURRENT ROW ), 1) AS grp
   FROM vt_a
)
SELECT
  b1.ref_dt,
  b1.id,
  b2.val1,
  b2.val2
FROM cte b1
  JOIN
  cte b2
    ON b1.grp = b2.grp
AND b2.val1 IS NOT NULL

 

Or you do nested OLAP functions:

WITH cte AS 
(   SELECT ref_dt,
      id,
      val1,
      val2,
      /*one grp id for all records belonging together, first NULL rows will be assigned 1 */
      Coalesce(Sum(CASE WHEN val1 IS NOT NULL THEN 1 END) 
               Over (ORDER BY ref_dt, id
                     ROWS BETWEEN Unbounded Preceding AND CURRENT ROW ), 1) AS grp
   FROM vt_a
 )
SELECT ref_dt,id,
   Max(val1) Over (PARTITION BY grp),
   Max(val2) Over (PARTITION BY grp)
FROM cte
Enthusiast

Re: Propagating column data for null values from previous rows until change in new value.

Thank you very much for helping me out Ulrich's and dnoeth 

 

Anvesh

 

Enthusiast

Re: Propagating column data for null values from previous rows until change in new value.

I have tried this adding one more key column in the order and when i do so the groupin is not bein done properly 

ideally one group should have same grp id but this is kind of not happening below is the sample data and query due to compliance morphing the data and column quick help is needed

key columns are date,qhour,id all the combination with this key should be in a group

 

Query used 

SELECT date,
qhour,
id,
col1,
col2,
col3,
/*one grp id for all records belonging together, first NULL rows will be assigned 1 */
Coalesce(Sum(CASE WHEN col1 IS NOT NULL THEN 1 END)
Over (ORDER BY date, qhour,id
ROWS BETWEEN Unbounded Preceding AND CURRENT ROW ), 1) AS grp
FROM VT_ECI_MISSING_QHOUR_STG
where id in ('310830000007455','310830214390608')
and date_key>=20170201
and date_key<=20170203;

 

DATEQHOURIDCOL3COL4COL5grp
201702010310830000007455   1
20170201031083021439060813371904552234051
2017020115310830000007455   1
2017020115310830214390608   1
2017020130310830000007455   1
2017020130310830214390608   1
2017020145310830000007455   1
2017020145310830214390608   1
20170201100310830000007455   1
2017020110031083021439060813371904552234052
20170201115310830000007455   2
20170201115310830214390608   2
20170201130310830000007455   2
20170201130310830214390608   2
20170201145310830000007455   2
20170201145310830214390608   2
20170201200310830000007455   2
20170201200310830214390608   2
20170201215310830000007455   2
20170201215310830214390608   2
20170201230310830000007455   2
20170201230310830214390608   2
20170201245310830000007455   2
20170201245310830214390608   2
20170201300310830000007455   2
20170201300310830214390608   2
20170201315310830000007455   2
20170201315310830214390608   2

underlying data  for the combination 2

DateHourQhouridcol1col2col3
20170201003108302143906081337190455223405
2017020111003108302143906081337190455223405
2017020144303108302143906081323878485171408
201702015530310830214390608000
201702011515003108302143906081337190455223405
201702011515453108302143906081337190455223405

underlying data  for the combination 2

DateHourQhourID col1col2col3
2017020116164531083000000745519796152177328749
2017020116163031083000000745519796152177328749
2017020117171531083000000745519796152177328749
2017020117173031083000000745519808056277375250
2017020123230031083000000745519808056277375250

expected result set 

DATEQHOURIDCOL3COL4COL5
20170201031083000000745519796152177328749
2017020103108302143906081337190455223405
201702011531083000000745519796152177328749
20170201153108302143906081337190455223405
201702013031083000000745519796152177328749
20170201303108302143906081337190455223405
201702014531083000000745519796152177328749
20170201453108302143906081337190455223405
2017020110031083000000745519796152177328749
201702011003108302143906081337190455223405
2017020111531083000000745519796152177328749
201702011153108302143906081337190455223405
2017020113031083000000745519796152177328749
201702011303108302143906081337190455223405
2017020114531083000000745519796152177328749
201702011453108302143906081337190455223405
2017020120031083000000745519796152177328749
201702012003108302143906081337190455223405
2017020121531083000000745519796152177328749
201702012153108302143906081337190455223405
2017020123031083000000745519796152177328749
201702012303108302143906081337190455223405
2017020124531083000000745519796152177328749
201702012453108302143906081337190455223405
2017020130031083000000745519796152177328749
201702013003108302143906081337190455223405
2017020131531083000000745519796152177328749
201702013153108302143906081337190455223405
2017020133031083000000745519796152177328749
201702013303108302143906081337190455223405
2017020134531083000000745519796152177328749
201702013453108302143906081337190455223405
2017020140031083000000745519796152177328749
201702014003108302143906081337190455223405
2017020141531083000000745519796152177328749
201702014153108302143906081337190455223405
2017020143031083000000745519796152177328749
201702014303108302143906081323878485171408
2017020144531083000000745519796152177328749
201702014453108302143906081323878485171408
2017020150031083000000745519796152177328749
201702015003108302143906081323878485171408
2017020151531083000000745519796152177328749
201702015153108302143906081323878485171408
2017020153031083000000745519796152177328749
20170201530310830214390608000
2017020154531083000000745519796152177328749
20170201545310830214390608000
2017020160031083000000745519796152177328749
20170201600310830214390608000
2017020161531083000000745519796152177328749
20170201615310830214390608000
2017020163031083000000745519796152177328749
20170201630310830214390608000
2017020164531083000000745519796152177328749
20170201645310830214390608000

 

quick help is need I also tried multiple approaches and not able to succeed

SEL
date
,qhour
, id
,FIRST_VALUE (col1 IGNORE NULLS)
OVER (PARTITION BY CAST(TRIM(id ) AS BIGINT)
ORDER BY date ,qhour desc,CAST(TRIM(id ) AS BIGINT)
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING ) AS col1
,FIRST_VALUE (col2 IGNORE NULLS)
OVER (PARTITION BY id
ORDER BY date ,qhour desc,id
ROWS BETWEEN current row AND UNBOUNDED FOLLOWING )
,FIRST_VALUE (col3 IGNORE NULLS)
OVER (PARTITION BY id
ORDER BY date ,qhour desc,id
ROWS BETWEEN current row AND UNBOUNDED FOLLOWING )
FROM VT_col1_MISSING_QHOUR_STG
WHERE id ='310830000007455'
AND date>=20170201 AND date<=20170201
ORDER BY 1,2;

 

Anvesh

Highlighted
Junior Contributor

Re: Propagating column data for null values from previous rows until change in new value.

You need to add the id as a gruing column:

Over (PARTITION BY id -- must be here, not in ORDER BY
      ORDER BY date, qhour
      ROWS ...