Comparison of two Teradata Tables which is having huge data

Database
The Teradata Database channel includes discussions around advanced Teradata features such as high-performance parallel database technology, the optimizer, mixed workload management solutions, and other related technologies.

Comparison of two Teradata Tables which is having huge data

Hi All,

 

we have requirement of comparing two diffrentdatabase tables with same structure after migration.

 

Requirement : we have to make sure that data before and after migration is same.

we are facing below challenges while doing comparison of two tables.

 

1. Tables having huge number of records and if you do minus ,union like below  its taking longer time to check. 

2. we have nearly 25 k objects to check in this migration

3.is ther better way to check the counts and data level check to perfom except "MINUS" to save time. Any other way in terms of performance tuneing

 

sel * from table-A
minus
sel * from table-B 
union 
sel * from table-B
minus
sel * from table-A

 

its a banking project, in this migration we have nearly 25 K objects will load so we need to check the data quality to confirm after the migration data is same or not?

 

is there any better way to compare? 

 

2 REPLIES 2
Ambassador

Re: Comparison of two Teradata Tables which is having huge data

See this blog from Ulrich Arndt regarding hashing for comparison.

 

Performance of your MINUS might be improved a bit by switching to MINUS ALL, but both approaches will probably spool the tables.

 

Way better than MINUS (if both tables share the same PI) is a Correlated NOT EXISTS over all columns. If columns are NULLable you must use

WHERE (a.col = b.col or (a.col is null and b.col is null)

A lot code (but it's done programatically anyway), but then it's AMP-local joins.

Teradata Employee

Re: Comparison of two Teradata Tables which is having huge data

There is this method I use when I have to compare big tables (inside the same system) that performs quite well :

with cte_union as
(
select 1 as src, t1.* from table1 as t1
 union all
select 2 as src, t2.* from table2 as t2
)
  select <all_columns_except_src>
       , max(case src when 1 then 1 else 0 end) as tab1
       , max(case src when 2 then 1 else 0 end) as tab2
    from cte_union
group by <all_columns_except_src>
  having min(src) < max(src)
order by <all_columns_except_src>, tab1

It's quite easy to write and it's "null friendly".