Advantage of Data Deduplication

Backup Archive and Restore (BAR)
All things BAR

Advantage of Data Deduplication

If we have BAR and EMC data domain will it record changes only on next backup once you have one full backup or how it works as we are facing a long backup window time ? how that can be reduced and how the companies setup their backups with 10 or 20 or 30 TBs. We are taking 15 to 20 hours just for 3 TB. Please help in this regard.

Teradata Employee

Re: Advantage of Data Deduplication



The EMC Data Domain appliance is a backup storage target ie.

  1. a Data Domain solution utilises deduplication technology ie. it will look for duplicate segments (variable length 8 - 12k) in the output streams on the fly
  2. Data Domain deduplciation is a storage side solution ie. for deduplication to work the most effectively you would send the same unchanged data in the same way over and over again
  3. a Data Domain records all data where unique segments are written to disk, duplicate segments are linked to the associated/duplicate block


Like all layered solutions theyneed to be viewed holistically, if you current backup and restore solution is being constrained by other factors adding/increasing the backend storage will probably make little to no difference.


Backups to Data Domain from databases usually have fairly low deduplication rates and Teradata's MPP can work against the deduplication process which is why real world performance numbers can vary wildly from the Teradata estimated values for Teradata Advocated BAR solutions with Data Domain.


Highly volatile/new data and/or non-compressible data (eg. BLC with DSA) work against the Data Domain process


Client side type incremental solutions would be the worst for deduplication devices since in theory the majority of the backup would be made up of unique data


Two of the major variables in Teradata BAR that can directly affect performance;

  • backup and restore against systems that are not quiescent
  • data demographics of the backup set






Re: Advantage of Data Deduplication

If BAR performance is the question using Data Domain (where do I start), here's something you could try 

Check for any blocking (the thing with DD - they do dictionary first and there might be some contention out there)

The performance of backup and restore are influenced by variety of other factors.

Number of streams, how you created the jobs, how many jobs you are running in parallel (I heard that practically you can't run more than one :-)).

Lastly the H/W itself. 


15 to 20 hours for 3 TB is very bad throughput.

I suggest you to talk to your Teradata Rep as soon as possible.


Re: Advantage of Data Deduplication

Subsquent backups are faster because of the deduplication because they write less data (subject to the rate of change in the data).


Re: Advantage of Data Deduplication

How subsequent backups can be performed without partitioned by clause in archive script. Any help