Transposing the data - QUERY

Database
Enthusiast

Transposing the data - QUERY

Hello,

We have come across a scenario where we need to update the previous records in a target table based on if the value is NULL upto its last NON NULL occurence. Here is the test scenario we have tried to mock up:

-- Definition of target table

Create table test_target

(ID integer,

 update_time timestamp(6),

 change_log_1 varchar(10),

 change_log_2 varchar(10),

 change_log_3 varchar(10),

 change_log_4 varchar(10),

 change_log_5 varchar(10),

 start_date timestamp(6),

 end_date timestamp(6)

) PRIMARY INDEX (ID);

-- Test values - Existing Target table shows that there are many NULL values

INSERT INTO test_target values (1234,'2013-01-01 00:00.000000','abc','def',NULL,NULL,NULL,'2013-01-01 00:00.000000','2013-02-01 02:00.000000');

INSERT INTO test_target values (1234,'2013-02-01 02:00.000000',NULL,NULL,'cdf','fgd','ref','2013-02-01 02:00.000000','2013-03-01 03:00.000000');

INSERT INTO test_target values (1234,'2013-03-01 03:00.000000',NULL,NULL,'fgr','lkj',NULL,'2013-03-01 03:00.000000','2013-04-01 04:00.000000');

INSERT INTO test_target values (1234,'2013-04-01 04:00.000000','cfg',NULL,'tyh','poi','fdt','2013-04-01 04:00.000000','2013-05-01 05:00.000000');

INSERT INTO test_target values (1234,'2013-05-01 05:00.000000',NULL,NULL,'trd',NULL,NULL,'2013-05-01 05:00.000000','2013-06-01 06:00.000000');

INSERT INTO test_target values (1234,'2013-06-01 06:00.000000','erf',NULL,'hjk',NULL,'iuf','2013-06-01 06:00.000000','2013-07-01 07:00.000000');

INSERT INTO test_target values (1234,'2013-07-01 07:00.000000',NULL,NULL,NULL,'gdf',NULL,'2013-07-01 07:00.000000','3999-12-31 23:99:999999');

-- Load table definition

Create test_load

(ID integer,

 update_time timestamp(6),

 columnname varchar(30),

 change_from varchar(10),

 change_to varchar(10)

) PRIMARY INDEX (ID);

-- The delta coming in overrides what information was previously sent.

-- It shows columns were changed from a value other than NULL which means the earlier information that was sent

-- is now updated. For e.g.change_col_2 was changed from lks to fgc but earlier it had NULL value

INSERT INTO test_load values (1234,'2013-08-01 08:00.000000','change_col_1','frs','txh');

INSERT INTO test_load values (1234,'2013-08-01 08:00.000000','change_col_2','lks','fgc');

INSERT INTO test_load values (1234,'2013-08-01 08:00.000000','change_col_4','gdf','fdr');

INSERT INTO test_load values (1234,'2013-08-01 08:00.000000','change_col_5','ytf','tjb');

-- Another definition for the target table this is ideally the same table but it has been created only to see

-- what results are expected. This is ideally the same table test_target

-- NULLs in the history should be updated to the change_from values (but only upto a point where the previous value is NOT NULL. See the difference between change_col_1 & change_col_2)

Create table test_target_expected_res

(ID integer,

 update_time timestamp(6),

 change_log_1 varchar(10),

 change_log_2 varchar(10),

 change_log_3 varchar(10),

 change_log_4 varchar(10),

 change_log_5 varchar(10),

 start_date timestamp(6),

 end_date timestamp(6)

) PRIMARY INDEX (ID);

INSERT INTO test_target values (1234,'2013-01-01 00:00.000000','abc','def',NULL,NULL,NULL,'2013-01-01 00:00.000000','2013-02-01 02:00.000000');

INSERT INTO test_target values (1234,'2013-02-01 02:00.000000',NULL,'lks','cdf','fgd','ref','2013-02-01 02:00.000000','2013-03-01 03:00.000000');

INSERT INTO test_target values (1234,'2013-03-01 03:00.000000',NULL,'lks','fgr','lkj',NULL,'2013-03-01 03:00.000000','2013-04-01 04:00.000000');

INSERT INTO test_target values (1234,'2013-04-01 04:00.000000','cfg','lks','tyh','poi','fdt','2013-04-01 04:00.000000','2013-05-01 05:00.000000');

INSERT INTO test_target values (1234,'2013-05-01 05:00.000000',NULL,'lks','trd',NULL,NULL,'2013-05-01 05:00.000000','2013-06-01 06:00.000000');

INSERT INTO test_target values (1234,'2013-06-01 06:00.000000','erf','lks','hjk',NULL,'iuf','2013-06-01 06:00.000000','2013-07-01 07:00.000000');

INSERT INTO test_target values (1234,'2013-07-01 07:00.000000','frs','lks',NULL,'gdf','ytf','2013-07-01 07:00.000000','2013-08-01 08:00.000000');

INSERT INTO test_target values (1234,'2013-08-01 08:00.000000','txh','fgc',NULL,'fdr','tjb','2013-08-01 08:00.000000','3999-12-31 23:99:999999');

We are trying to achieve what we get in the table test_target_expected_res using a single query or even multiple passes.

Thank You,

Indrajit

6 REPLIES
Senior Supporter

Re: Transposing the data - QUERY

Hi,

your timestamps values are not valid. As you expect specific results it would be good if you can review the data.

timestamp need to be of format 'YYYY-MM-DD HH:Mi:SS.ssssss' - I guess you missed the Hour? And you can't have more then 59.999999 secs...

Enthusiast

Re: Transposing the data - QUERY

-- STAGE TABLE
CREATE TABLE stage_tbl
(ID INTEGER,
change_id INTEGER,
update_time TIMESTAMP(6),
columnname VARCHAR(30),
change_log_from VARCHAR(10),
change_log_to VARCHAR(10)
) PRIMARY INDEX (ID);

INSERT INTO stage_tbl VALUES (1234,5678,'2013-07-01 07:00.00.000000','change_col_1','','erf');
INSERT INTO stage_tbl VALUES (1234,5678,'2013-08-01 08:00.00.000000','change_col_2','lks','txh');
INSERT INTO stage_tbl VALUES (1234,5678,'2013-08-01 09:00.00.000000','change_col_1','erf','fdr');
INSERT INTO stage_tbl VALUES (1234,5678,'2013-08-01 09:00.00.000000','change_col_2','txh','rep');
INSERT INTO stage_tbl VALUES (1234,5678,'2013-08-01 09:00.00.000000','change_col_3','','fdy');
INSERT INTO stage_tbl VALUES (1234,5678,'2013-08-01 09:00.00.000000','change_col_4','rfd','uif');
INSERT INTO stage_tbl VALUES (1234,5678,'2013-08-01 09:00.00.000000','change_col_5','lop','jhr');
INSERT INTO stage_tbl VALUES (1234,5678,'2013-08-01 11:00.00.000000','change_col_5','jhr','puh');
INSERT INTO stage_tbl VALUES (1234,78965,'2013-07-01 07:00.00.000000','change_col_1','rdt','ted');
INSERT INTO stage_tbl VALUES (1234,78965,'2013-08-01 08:00.00.000000','change_col_2','wfs','trh');
INSERT INTO stage_tbl VALUES (1234,78965,'2013-08-01 09:00.00.000000','change_col_3','','');
INSERT INTO stage_tbl VALUES (1234,23654,'2013-08-01 08:00.00.000000','change_col_4','gfd','');
INSERT INTO stage_tbl VALUES (1234,32111,'2013-08-01 09:00.00.000000','change_col_5','red','');

-- WRITE A QUERY ON STAGE TABLE SO THAT WE GET THE RESULTS AS IN expected_res below:

CREATE TABLE expected_res
(ID INTEGER,
change_id INTEGER,
update_time TIMESTAMP(6),
change_col_1 VARCHAR(10),
change_col_2 VARCHAR(10),
change_col_3 VARCHAR(10),
change_col_4 VARCHAR(10),
change_col_5 VARCHAR(10)
) PRIMARY INDEX (ID);

INSERT INTO expected_res VALUES (1234,5678,'2013-07-01 07:00.00.000000','erf','lks',NULL,'rfd','lop');
INSERT INTO expected_res VALUES (1234,5678,'2013-08-01 08:00.00.000000','erf','txh',NULL,'rfd','lop');
INSERT INTO expected_res VALUES (1234,5678,'2013-08-01 09:00.00.000000','fdr','rep','fdy','uif','jhr');
INSERT INTO expected_res VALUES (1234,5678,'2013-08-01 11:00.00.000000','fdr','rep','fdy','uif','puh');
INSERT INTO expected_res VALUES (1234,78965,'2013-07-01 07:00.00.000000','ted','wfs',NULL,NULL,NULL);
INSERT INTO expected_res VALUES (1234,78965,'2013-08-01 08:00.00.000000','ted','trh',NULL,NULL,NULL);
INSERT INTO expected_res VALUES (1234,78965,'2013-08-01 09:00.00.000000','ted','trh',NULL,NULL,NULL);
INSERT INTO expected_res VALUES (1234,23654,'2013-08-01 08:00.00.000000',NULL,NULL,NULL,'gfd',NULL);
INSERT INTO expected_res VALUES (1234,32111,'2013-08-01 09:00.00.000000',NULL,NULL,NULL,NULL,'red');

LOGIC:
- The row in the target table should be at ID/Change_Id/update_time level
- If the value for a particluar column doesnt come in the following delta then we should take the previous value for e.g. 1234 5678 2013-07-01 07:00.00.000000
we got the value for change_column_1 as erf
but in the next days run i.e. 2013-08-01 08:00.00.000000 there was no value for change_column_1 as it had no change so we need to take the previuos days value.
Senior Supporter

Re: Transposing the data - QUERY

Are you looking for something like 

select ID, 
change_id,
update_time,
substr(max(case when columnname = 'change_col_1' then cast(update_time as char(30)) ||change_log_to else null end) over (partition by ID,change_id order by update_time, columnname rows between unbounded preceding and current row ),31) as last_val_col1,
substr(max(case when columnname = 'change_col_2' then cast(update_time as char(30)) ||change_log_to else null end) over (partition by ID,change_id order by update_time, columnname rows between unbounded preceding and current row ),31) as last_val_col2,
substr(max(case when columnname = 'change_col_3' then cast(update_time as char(30)) ||change_log_to else null end) over (partition by ID,change_id order by update_time, columnname rows between unbounded preceding and current row ),31) as last_val_col3,
substr(max(case when columnname = 'change_col_4' then cast(update_time as char(30)) ||change_log_to else null end) over (partition by ID,change_id order by update_time, columnname rows between unbounded preceding and current row ),31) as last_val_col4,
substr(max(case when columnname = 'change_col_5' then cast(update_time as char(30)) ||change_log_to else null end) over (partition by ID,change_id order by update_time, columnname rows between unbounded preceding and current row ),31) as last_val_col5
from stage_tbl
qualify row_number() over (partition by id, change_id, update_time order by columnname desc ) = 1

Its not 100% as of your expected results but it is also unclear to me how

INSERT INTO expected_res VALUES (1234,5678,'2013-07-01 07:00.00.000000','erf','lks',NULL,'rfd','lop');

should be generated from your stg data...

Enthusiast

Re: Transposing the data - QUERY

Thank you.

This is generated as the value in change_to column is erf for change_col_1 for date '2013-07-01 07:00.00.000000'
- change_col_2 got changed on 2013-08-01 08:00.00.000000 from lks to txh so its value earlier to 2013-08-01 08:00.00.000000 should be lks
- change_col_3 got changed on 2013-08-01 08:00.00.000000 from SPACE(NO VALUE should be treated as NULL) to fdy so its value earlier to 2013-08-01 09:00.00.000000
should be NULL
- same as change_col_2 for change_col_4 & change_col_5
Senior Supporter

Re: Transposing the data - QUERY

you could do the same with min and the change_log_from column 

build two derived tables (one with the max one with the min) 

and join the results together and coalesce the max, min values. 

Enthusiast

Re: Transposing the data - QUERY

@ulrich - Thank you