Validate column data

Analytics
Enthusiast

Validate column data

Hi,

I'm trying to validate that each character in a varchar(100) row is between 0 and 9 on a 250M row table. Any rows that are not valid will be ignored in an insert statement into a base table. However, this is giving me a hard time. I tried using a tally table with substring comparison in an exists clause, but the data are just too huge. Is there an efficient way to validate my data?

Thanks,
David
4 REPLIES
Enthusiast

Re: Validate column data

Check if you have the Oracle UDF's installed - you can get them from the Teradata Download centre.
There is an otranslate finction - Oracle's Translate - which can check them for you.

See:
http://teradataquestions.stackexchange.com/questions/16/the-format-or-data-contains-a-bad-character
Enthusiast

Re: Validate column data

I'm a downstream user with few permissions and I see no UDFs in the meta-data. My work is pretty much limited to pure SQL.
Enthusiast

Re: Validate column data

I have not understood your doubt?
What you want to do , either take the valid records only or you want to see the invalid records?
If you want to see the invalid records(records with character from Ato Z).
Login through bteq
use .Set Session transaction ANSI ; before login

create volatile table test_acct
(acct_num varchar(20) ,
acct_num2 varchar(20) )
on commit preserve rows ;

insert into test_acct
sel upper(acct_num) , lower(acct_num)
from accounts ;

Now you can take out the records which has characters by
sel * from test12
where acct_num <> acct_num2 ;

Hope this works
:)
Enthusiast

Re: Validate column data



That's interesting as far as alpha-numeric data, but these strings also include various non-alpha characters. However, I solved the problem through direct analysis of each character column by grouping:

SELECT SUBSTRING( order_number FROM n FOR 1 ), COUNT(*) AS Total
FROM Orders
GROUP BY SUBSTRING( order_number FROM n FOR 1 )
ORDER BY SUBSTRING( order_number FROM n FOR 1 )
;

In the end I found there were repeated patterns which could be filtered easily. If new patterns appear in the future, then the process will break and the script will have to be modified. The data are pretty regular, so not too bad really.

Thanks for the suggestions!