Hi - I have a requirement to do a data quality check on the data entered by the Users, the data quality check should make sure the data does not contains spaces (leading/trailing),special characters and only cotains numbers and alphabets....the Volume of data ranges from Minimum few thousands to 100 millions ...so the data quality check should runs with a good perfomance ..any Ideas please much appreciated
I had a few idea on the data quality check on the Spaces alone (Trailing/leading) and also it did a good perfomance
select sum(char_length(name)) - sum(char_length(trim both from name ) from X
Say if above the query retruns more than 0, then i consider the data has some apces and fail the process...
Simililary does anyone has any data quality ideas for Special characters,spaces etc ?
If is is for a small and one-off situation , you can develop scripts. Teradata has Miner's Profiler, DQRM, if you have them. You can check. But many advanced features are also available in Ataccama's solutions for DQ, including MDM....
Tools are more convenient and faster.
I referred to one of your another post to use the otranslate
CHAR_LENGTH(oTranslate(col, 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ', '')
However my sql query returns an error like 3706 column name does not match a defined type name ...
OTranslate plus regular expressions are implemented in TD14, before there's only oTranslate as part of the Oracle UDFs, ask your TD support for the latest release.
Thanks Dieter, Any ideas for TD13 is much apppreciated, currently the production is still in TD13 and im not sure when the whole company might upgrade to TD14