Acceptable Duplicate Row Count


Acceptable Duplicate Row Count

Hi all,

I have a database with many tables (around 60-70) where around half the number of tables, or slightly more, have a small amount of duplicate rows. (Between 0-3% duplicate rows on average in a table). I know the best case is to have 0% of duplicate rows, however I was wondering, if there is some rule of thumb or industry standard regarding an acceptable amount of duplicate rows in a table.

Would 2% duplicate rows be considered a big problem or would this be somewhat acceptable in most cases in the industry.

Thanks all.
Junior Contributor

Re: Acceptable Duplicate Row Count

I would consider any duplicate row (outside of the staging area) to be non-acceptable.

Not only because it contradicts the Relational Model (no PK), but also because it might screw your queries/results.
How to join to a table without uniqueness?

This must be handled during load.

If 2% duplicate rows are in the payroll application i'd like to be one of those duplicates :-)

Teradata Employee

Re: Acceptable Duplicate Row Count

Agree with Dieter, however in some cases (I have seen them in Telco industry) dups are functionally required, the number of acceptable dups in such case depends to the number of hash collision you may have with your PI, If the number of hash collision on your PI is less than 100,
then I would accept a maximum of 100 dups in the multiset table.