Update on copy / move data between systems

Blog
The best minds from Teradata, our partners, and customers blog about whatever takes their fancy.
Senior Supporter

Some months ago I shared a demo tool to copy data between two Teradata systems using TPT (http://developer.teradata.com/blog/ulrich/2011/02/tpt-plain-copy-tables-demo). The tool is generating and executing the TPT scripts and enables fast test of throughput and best parameter settings. 

In the mean time we learned some lessons which I like to share here.

The first version of the demo tool was not able to copy UNICODE data between two systems. We extended the tool to handle this now correctly which is much more complicated than expected. The "challenge" is that a SCHEMA has to be defined and in case that you have a mixture of Latin and Unicode characters columns you need to define the unicode schema character lengths as three times of the table varchar field length. The disadvantage is that the overall row length remains at 64K. A bit complicated but this means finally that you can not copy all valid unicode character set tables with TPT. If the field length it too long the resulting schema will be rejected. I was not too happy about this - but maybe I just missed something. Would be happy to learn.

But also some new data types like geospacial data can not be copied between TD systems using TPT due to the UTD usage.

This lead us to the implementation of a simple copy implementation using ARCMAIN. It follows the same principle of specifying "only" a basic parameter script and the related backup and restore scripts will be generated and executed by the tool. To avoid data landing on file we implemented it with a named pipe solution which means the backup writes to a named pipe and the restore read from this named pipe. Both processes run in parallel. We integrated this into the existing demo script which is attached to the post. Some new parameters are defined and allows you to specify which tool to use.

The main limitation is that the two TD systems need to be on the same release (or the source system can be on one lower release - no arcmain copy from higher to lower release). In addition the arcmain solution can "only" copy complete tables and not subsets like the TPT solution. But the nice thing is that every table is copied (at least we didn't experienced problems in our test cases).

It seems to be worth to check the arcmain solution as this was 2 times faster than the best parameter sets for TPT.

For the main setup of the attached demo tool check the other blog entry and the containing documentation.

So keep evaluating TPT and arcmain to move data between systems.

Update: There was a bug on handling multiset tables. This is fixed in the actual attached version.

23 Comments
Enthusiast
Ulrich,
I have been playing around with ARCMAIN/Named Pipes as well I'm running my scripts
from a LINUX box that sits between 2 Teradata boxes. Have you discovered any tuning
parameters for ARCMAIN ? I seem to be running slower with ARCMAIN then with TPT ...
Most of the time I use ARCMAIN to move data between two Teradata boxes it is always much faster than any other tools , the only thing if you are having the slower ARCMAIN , please cehck the network , or check is there any lock while running ARCMAIN.

Senior Supporter
Hi WOJO,
same experiance as Dharmendra. Since I implemented the named pipe version arcmain was always faster and no no special parameter settings.
Enthusiast
Hi,

I just wanted to thank you for this tool. We are considering to give it a go in the migration of some of our data - first tests succesful. Any (parameter) suggestions for moving a COMPLETE DATABASE? i.e., all tables (is it possible to specify that? As there is a tag)

Cheers!
Senior Supporter
Sorry for the late reply, didn't get an email notification...
This demo version doesn't have the option to move a whole DB. Where it is not a problem to implement this. Send me an email if you want to discuss this offline. Email details can be found on our website.
Enthusiast
Hi Ulrich,

I have started to use this tool to move data from our production system to test system. I realized that it acquires a read lock(rather than an access lock) during tptexport. This can cause a problem(where writer jobs against the same table will be blocked due to the read lock) if we are transporting a very large table ( 12 billion records) from production to test.

Can it be changed to perform an access lock rather than a read lock?

Regards,
Suhail
Senior Supporter
Current tool doesn't support this option.
Might check possibilities the next days but can't promise anything.
Enthusiast
Thanks Ulrich! In the meanwhile, is there a possibility you can provide me the .java file?

Currently the attachment you have only contains the .jar.

I'll try to make this change myself.
Senior Supporter
Haha, nice try - no, no sources for free.
But I uploaded a new version. not with lock_for_access parameter. Default is N. You would need to set it to Y.
Enthusiast
:)....Just trying my luck!!!....Thanks a lot for the updated new version.
Senior Supporter
you owe me a drink in case you join Partners next week ;-)
Enthusiast
Sorry...i'm not coming to partners...but feel free to ask for a drink from the Charles Schwab contingent!!!
Enthusiast
Ulrich, correct me if I'm wrong. The tool works only if target table is empty right?

Senior Supporter
Yes, thats right. It is using the load operator.
Enthusiast
ok thanks!...I'm trying to copy an 11 billion record table from my production to test system. With 1 reader, 1 writer and 5 sessions for export, 20 sessions for load....I'm getting the job done in 3-4 hours.

I need to see if this job can be tuned further probably by increase the number of reader/writer jobs or changing the number of export/loader sessions. For this, I need to see the tpt run statistics that shows me whether any instance is overloaded or underutilized.

I'm trying to find the tpt run statistics in the log files that the tool creates: ctp_YYYYMMDD_HHMMSS.log and ctp_tpt_YYYYMMDD_HHMMSS.log but I can't find them.

I also tried ctp20121021_173435-27.out under /opt/teradata/client/13.10/tbuild/logs but the content of the file is not very readable....unable to understand whats going on.

Any other place i should see?

Enthusiast
Never mind. I think i got my answer. I need to use the following commands to get detailed logs:

tlogview -l ctp20121021_173435-27.out

tlogview -j ctp20121021_173435-27 -p

tlogview -j ctp20121021_173435-27 -v "%M" -f ' * ' -g

Senior Supporter
Hi,
yes tlogview is the source for your info requirement.
But also did you check the arcmain statements above?
So far arcmain should give you the highest transfer rate...
Enthusiast
For really large tables, i'm afraid to use arcmain because they will cause a HUT lock on the production table(please correct me if I'm wrong)

During the time the HUT lock is present, the table cannot be accessed by anyone.

This is the reason, I requested for the "lock_for_access" parameter.

Going forward, I expect this tool to be heavily used by our development team. Hence I have 2 approaches:

1. For large tables say > 500 million, the dev team will contact dba team to restore the data from the production weekly backup tapes we perform on weekends. So here...we won't be touching the production database at all. We will use the tapes on the barserver and restore the data in our test system

2. For small tables say < 500 million, we will use the tpt option with lock_for_access parameter as "yes"
-Suhail
Senior Supporter
You should be able to read the table while arcmain is extracting the data. You won't be able to modify the table. But just try on your system.
Enthusiast

Hi Ulrich,

quick question. The arcmain option in this tool uses pipes right? I need to move a 7 TB table from one system to another and I don't have the hard drive space for it.

Regards,

Suhail

Senior Supporter

Hi Suhail,

yes, as stated in the text ;-)

"To avoid data landing on file we implemented it with a named pipe solution which means the backup writes to a named pipe and the restore read from this named pipe."

Ulrich

Teradata Employee

Hey,

I'm facing an issue in copying data from one machine to another using arcmain copy command

ARCHIVE DATA TABLES (UT_TAB.EVENT) (PARTITIONS WHERE (!EVENT_START_DT BETWEEN '2013-10-01' AND '2013-10-31'!))   , ABORT, RELEASE LOCK, FILE = EV13_OCT;
COPY DATA TABLE (DP_TAB.EVENT_OCT) (FROM (UT_TAB.EVENT)), RELEASE LOCK, FILE = EV13_OCT;

but this archive script archive more rows from source machine for eg:

Source Machine event table for month of october has 14,602,977 rows but the archive scripts archive 17,444,874 rows can anyone please help me in this.

Senior Supporter

Do you use native archive? Or the tool mentioned above?