I'm trying to validate that each character in a varchar(100) row is between 0 and 9 on a 250M row table. Any rows that are not valid will be ignored in an insert statement into a base table. However, this is giving me a hard time. I tried using a tally table with substring comparison in an exists clause, but the data are just too huge. Is there an efficient way to validate my data?
I have not understood your doubt? What you want to do , either take the valid records only or you want to see the invalid records? If you want to see the invalid records(records with character from Ato Z). Login through bteq use .Set Session transaction ANSI ; before login
That's interesting as far as alpha-numeric data, but these strings also include various non-alpha characters. However, I solved the problem through direct analysis of each character column by grouping:
SELECT SUBSTRING( order_number FROM n FOR 1 ), COUNT(*) AS Total FROM Orders GROUP BY SUBSTRING( order_number FROM n FOR 1 ) ORDER BY SUBSTRING( order_number FROM n FOR 1 ) ;
In the end I found there were repeated patterns which could be filtered easily. If new patterns appear in the future, then the process will break and the script will have to be modified. The data are pretty regular, so not too bad really.