Outlier Detection

Database
Fan

Outlier Detection

Hello,

I was hoping to get a little help on some sql. I have a table that contains account number which are created in a sequential order, thus there is a dependancy between date it is created and the account number. Some of these accounts have incorrect creation date, which i need to identify and correct. ex.

Date    Acct

7/1      108

7/1      100

7/1      101

7/1      102

7/2      105

7/2      106

7/4      103

7/4      104

7/4      107

I need to identify acct # 108 and change its date to 7/4 and acct ## 103 and 104 and change their dates to either 7/1 or 7/2 (not significant which way to go)

Thanks in advance for any help.

Chris

Tags (1)
1 REPLY
Fan

Re: Outlier Detection

My issue is that when comparing to previous or following records in order to identify an outlier, I don't know how to exclude those previous records that are already identified as an outlier.  if there are more than one consecutive bad records, my logic doesn't work.

**

select

date,

max(minacct) over (order by date rows between 1 preceding and 1 preceding) as prev_min,

max(maxacct) over (order by date rows between 1 preceding and 1 preceding) as prev_max,

min(acct_num) as minacct,

max(acct_num) as maxacct,

max(minacct) over (order by date rows 1 following and 1 following) as next_min,

max(maxacct) over (order by date rows 1 following and 1 following) as next_max,

(case

when maxacct < prev_max then maxacct

when maxacct > next_min and maxacct > next_max then maxacct

when minacct < prev_max and minacct < prev_min then minacct

when minacct > next_min then minacct

end) as outlier

from tableX

**

doesn't work where there are bad records one after another.

when I try something like

...

min(case when outlier1 = 0 then minacct end) over (order by date rows unbounded preceding) as prev_min,

max(case when outlier1 = 0 then maxacct end) over (order by date rows unbounded preceding) as prev_max,

....

this gives me an error that ordered analytical functions can not be nested. Also can't use where or having clauses with the ordered analytical functions.

Thanks again