Data Quality checks on the user data

General

Data Quality checks on the user data

Hi - I have a requirement to do a data quality check on the data entered by the Users, the data quality check should make sure the data does not contains spaces (leading/trailing),special characters and only cotains numbers and alphabets....the Volume of data ranges from Minimum few thousands to 100 millions ...so the data quality check should runs with a good perfomance ..any Ideas please much appreciated

I had a few idea on the data quality check on the Spaces alone (Trailing/leading) and also it did a good perfomance

select sum(char_length(name)) - sum(char_length(trim both from name ) from X

Say if above the query retruns more than 0, then i consider the data has some apces and fail the process...

Simililary does anyone has any data quality ideas for Special characters,spaces etc ?

Tags (1)
6 REPLIES
Enthusiast

Re: Data Quality checks on the user data

If is is for a small and one-off situation , you can develop scripts. Teradata has Miner's Profiler, DQRM, if you have them. You can check. But many advanced features are also available in Ataccama's solutions for DQ, including MDM....

Tools are more convenient and faster.

Junior Contributor

Re: Data Quality checks on the user data

Your requirements should be covered by some simple regular expression using REGEXP_SIMILAR

Re: Data Quality checks on the user data

Hi dieter,

I referred to one of your another post to use the otranslate

CHAR_LENGTH(oTranslate(col, 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ', '')

However my sql query returns an error like 3706 column name does not match a defined type name ...

Re: Data Quality checks on the user data

And the teradata version i use is 13.10.07.18

Junior Contributor

Re: Data Quality checks on the user data

OTranslate plus regular expressions are implemented in TD14, before there's only oTranslate as part of the Oracle UDFs, ask your TD support for the latest release.

Re: Data Quality checks on the user data

Thanks Dieter, Any ideas for TD13 is much apppreciated, currently the production is still in TD13 and im not sure when the whole company might upgrade to TD14