Last changed value in the column

Database
Enthusiast

Last changed value in the column

Hi,

Kindly help on this below.

In a particular columns,how to go to the previous values until the value that we look for.

 

KeyCol_DateCol_ValOutput
123456Jan-18AustraliaAustralia
123456Feb-18NAAustralia
123456Mar-18NAAustralia
123456Apr-18SOUTH AFRICASOUTH AFRICA
123456May-18NASOUTH AFRICA
123456Jun-18NASOUTH AFRICA
123456Jul-18NASOUTH AFRICA
123456Aug-18NASOUTH AFRICA
123456Sep-18NASOUTH AFRICA
123456Oct-18ENGLANDENGLAND
123456Nov-18ENGLANDENGLAND
123456Dec-18NAENGLAND
123456Jan-19NAENGLAND

 

Thanks.


Accepted Solutions
Highlighted
Junior Contributor

Re: Last changed value in the column


If I just remove the column with the Last_Value function from the view then it hits the partition... not sure is it becasue of the analytical function usage in the view.

It's because the result can be different when the WHERE against the view is done before LAST_VALUE, e.g.

 

WHERE COL_DATE BETWEEN DATE '2018-08-01' AND DATE '2018-10-31';

would return

 

 

123456 Aug-18 NA NULL
123456 Sep-18 NA NULL
123456 Oct-18 ENGLAND ENGLAND

instead of

 

 

123456 Aug-18 NA SOUTH AFRICA 
123456 Sep-18 NA SOUTH AFRICA 
123456 Oct-18 ENGLAND ENGLAND 

 

 

In this case, I believe this output column should be added into a table rather than in a view so that query performance would be good.

This should be calculated as part of the loading process, assuming that Col_Val is not updated after loading.

But why "added", simply modify Col_Val

 

 

 

1 ACCEPTED SOLUTION
11 REPLIES 11
Junior Contributor

Re: Last changed value in the column

Use LAST_VALUE:

LAST_VALUE(Col_Val INGORE NULLS)
OVER (PARTITION BY Key
      ORDER BY Col_Date)
Enthusiast

Re: Last changed value in the column

Thanks for your quick response on this.

 

I already tried with Last_Val and it works fine but creating an issue while creating a view on top of this select. Getting an error as '3731 The user must use IS NULL or IS NOT NULL to test for NULL values.'

 

My Try was getting the previous value through lag and populate the ouput and remaining as NULL. then use last vaule function to ignore nulls and carry forward the same output.

 

is there any other option to travel in a single column values until we get the match.

For Example, for sep the value is NA. In this case, query should go to the last value until NA is changed to SOUTH AFRICA and populate it.

 

Thanks.

Junior Contributor

Re: Last changed value in the column

You can change NA to NULL:

LAST_VALUE(NULLIF(Col_Val, 'NA') INGORE NULLS)
OVER (PARTITION BY Key
      ORDER BY Col_Date)
Enthusiast

Re: Last changed value in the column

Hi,

 

It worked well... Thank you so much for your inputs.

But when we querying the view (with this select using LAST_VALUE)  for a particular date range, it does not hit the partition. i.e static partition elimination didnt occur.

what is happening in the explain is 'All AMP STAT function step by the way of an all-rows scan...

Could you please help to clarify why it is not hitting the partition if possible.

 

Thanks again.

 

Junior Contributor

Re: Last changed value in the column

Why partition elimination, what's your actual Select?

 

A WHERE-condition against this view can probably not be pushed into the view, thus it's applied after the LAST_VALUE calculation.

Does Explain change when you add the condition to the base Select?

 

And how is the original result created?

Enthusiast

Re: Last changed value in the column

Why partition elimination, what's your actual Select?

 

REPLACE VIEW TEST

AS SEL KEY1, COL_DATE, COL_VAL, 

Last_Value(NullIf(COL_VAL,'NA') IGNORE NULLS) Over(PARTITION BY KEY1 ORDER BY COL_DATE) AS OUTPUT1

FROM TEST_T;  --(TEST_T is monthly partitioned table.)

 

SEL KEY1, COL_DATE, COL_VAL, Last_Value(NullIf(COL_VAL,'NA') IGNORE NULLS) Over(PARTITION BY KEY1 ORDER BY COL_DATE) AS OUTPUT1 FROM TEST_T

WHERE COL_DATE BETWEEN 1180131 AND 1190131;  --- This query is hitting the specific partition range on the big table (assume table has data since 2005).

 

SEL * FROM TEST (view created on the partitioned table)

WHERE COL_DATE BETWEEN 1180131 AND 1190131;  -- This query is not hitting the partition instead it goes for ‘'All AMP STAT function step by the way of an all-rows scan’

 

A WHERE-condition against this view can probably not be pushed into the view, thus it's applied after the LAST_VALUE calculation.

Does Explain change when you add the condition to the base Select?

 

Yes. Explain changes when we add the WHERE-condition to the base select… It is hitting the partition… for ex. Where col_date between 1180131 and 1190131.

 

And how is the original result created?

SEL KEY1, COL_DATE, COL_VAL, Last_Value(NullIf(COL_VAL,'NA') IGNORE NULLS) Over(PARTITION BY KEY1 ORDER BY COL_DATE) AS OUTPUT1

FROM TEST_T;

 

Thanks.

Junior Contributor

Re: Last changed value in the column

As I said, the WHERE can't be pushed into the view (otherwise it might change the result).

 

TEST_T is actually a table with a row per month and NULLs?

 

Enthusiast

Re: Last changed value in the column

Hi Dnoeth,

 

I understand.

 

If I just remove the column with the Last_Value function from the view then it hits the partition... not sure is it becasue of the analytical function usage in the view.

 

Instead of having this output stored in a table, just thought of adding this in a view. But query performance after adding in the view is not good bcse of full table scan.

 

In this case, I believe this output column should be added into a table rather than in a view so that query performance would be good.

 

Please correct me if anything is wrong.

 

I would say thank you so much for your kind response on this issue.

 

Thanks.

Highlighted
Junior Contributor

Re: Last changed value in the column


If I just remove the column with the Last_Value function from the view then it hits the partition... not sure is it becasue of the analytical function usage in the view.

It's because the result can be different when the WHERE against the view is done before LAST_VALUE, e.g.

 

WHERE COL_DATE BETWEEN DATE '2018-08-01' AND DATE '2018-10-31';

would return

 

 

123456 Aug-18 NA NULL
123456 Sep-18 NA NULL
123456 Oct-18 ENGLAND ENGLAND

instead of

 

 

123456 Aug-18 NA SOUTH AFRICA 
123456 Sep-18 NA SOUTH AFRICA 
123456 Oct-18 ENGLAND ENGLAND 

 

 

In this case, I believe this output column should be added into a table rather than in a view so that query performance would be good.

This should be calculated as part of the loading process, assuming that Col_Val is not updated after loading.

But why "added", simply modify Col_Val