Using a named pipe archive to copy a database from one system to another

Blog
The best minds from Teradata, our partners, and customers blog about whatever takes their fancy.
jim
Teradata Employee

Recently, I was presented with a situation to move several Terabytes of data from one machine to another. So FastExport/FastLoad really wasn’t feasible. I didn’t have the hard drive space to hold the flat files because they were so big. So then I tried to archive the database, but again, the archive file was too big. So digging a little deeper, I found that you can move a database from one machine to another using a named pipe archive. This allows an archive of one database and a restore onto another at the same time and the archive file only fills up to a buffer, so it doesn’t get very big. I am not a Linux guru, but I was able to get this done pretty quickly.

What you will need before embarking on this path. Logon credentials to one of the Linux machines or a Linux machine with ARCMAIN installed having network access to both Teradata Databases. Either the source database machine or destination, I don’t think it matters. Logon credentials to both the Source Database and Destination Database and the Database Sizes of the Source Data. You can get this information from Teradata Administrator. In Teradata Administrator “right click” on a database/user and select the option that best gives you the information you are looking for for what you intend to copy.

  1. Create database placeholders on the destination machine with enough space to hold the data.
    1. So if on the source machine, you have a database called DB1 that is 1Tb in size, you need to create a database on the receiving Teradata Database with the same name that also has 1Tb of space allotted.
    2. This can be done using SQL Assistant or Teradata Administrator. You can also use BTEQ, but I am a GUI cripple, so if a tool exists, I will use it.
  2. Here is where it gets scary for us windows guys – using the credentials above.... Log onto the Linux machine.
    1. I use a Telnet/SSH client called putty, you can use whatever you normally use or can download putty for free. Just do a search on google for "putty" "ssh" and you should see some links to the download.
    2. Open your ssh client and point it to the Linux machine you intend to use.
    3. It will prompt you for your logon credentials.
  3. You need to create the “Pipe”, the file that will be used as the buffer.
    1. Type in at the command prompt:
      1. mkfifo ARCPIPE (Press the enter key)
      2. ARCPIPE is the name of the file. It can be any name you like under 8 characters.
  4. If you want to make sure the file was created:
    1. Type in ls at the prompt and a directory listing should show in the terminal window. You should see your ARCPIPE file there.
  5. Start up ARCMAIN
    1. Type in ARCMAIN (Press Enter)
    2. You should see about 15 lines of text appear on the screen showing “ARCMAIN” information.
  6. Start The archive:
    1. type in the logon information to the source
      1. logon mysourcedb/user,password; (then press enter)
      2. mysourcedb is the machine name for the source data. You can use the ip of the machine or a dns name followed by a forward slash.
      3. User is the users logon name followed by a comma
      4. Password is the users password followed by the semi-colon
  7. Start the archive
    1. Type in the following command:
      1. archive data tables (mydb) all, abort, release lock, file=ARCPIPE; (Press Enter)
        1. mydb is the name of the database to archive, in our case the database to move.
    2. ARCMAIN will process the statement and show you some information like event number and number of sessions and then it will sit and wait….
  8. Open up another session of putty and logon to the same Linux machine.
    1. Just as above in step 2, open up a new putty session and log in. You will now have a second window opened to the same Linux machine.
  9. Start ARCMAIN in the second putty window:
    1. type in the logon information to the source
      1. logon mydestinationdb/user,password; (then press enter)
      2. mydestinationdb is the destination database location. You can use the ip of the machine or a dns name followed by a forward slash.
      3. User is the users logon name followed by a comma
      4. Password is the users password followed by the semi-colon
    2. NOTE: The user must have the correct privileges to be able to create tables, etc.
  10. Start the recover step:
    1. Type in the following command:
      1. restore data tables (mydb) all, abort, release lock, file=ARCPIPE; (Press Enter)
        1. mydb is the name of the destination database to restore to.
    2. Once you press the “Enter” key, you will see both putty windows begin to output information about which table is being processed and how many rows were transferred etc.
  11. Logoff from both machines
    1. Once the transfer is complete, type the following command at each prompt:
      1. Logoff;  (make sure you put the semicolon after the logoff).
      2. Exit (Press Enter) – This will close the putty session.

That’s it, pretty simple even for a windows guy!

25 Comments
Enthusiast
Nice tidy explanation of how to use a named pipe. Thank you.
Enthusiast
Good and discrete simple steps !! To be able to do a restore, the object must exist on the target, if one were to use this method for setting up a new system/environement, you could leverage TSET tool and do an export on the database and use the DDL from the TSET to create the object on the target system and then run the restore.
All is well and seamless(? ;) !!
jim
Teradata Employee
You can also use this to do an entire database. Instead of "archive/restore data tables (mydb) all", you can use "archive/restore data tables (dbc) all" which would copy the entire system. ARCMAIN has several options like exclude where you can archive everything except what you exclude etc. This blog was not meant to teach everything about ARCMAIN, but was to give a simple example to get you going. For me, it was a place to put an example for me so that in 6 months when I have to do it again, I know where to look. Thanks for the comments.
Enthusiast
Teradata's Nparc is doing that (with multiple multiple name pipes in between). older tools like CMS (conterminous migration tool) is doing that too.

Name pipe do have one limitations, it all went through one unix box's , bottle neck at one box's in/out on the network side. I am wondering with TARA infra structure, can we do something using multiple name pipes on multiple boxes to migrate data, that will be awesome!
Enthusiast
Emilwu, you can run this process on multiple unix boxes now assuming each "stream" is working with a different set of objects. We've been using this process for quite some time to move data between environments for our application releases. Also, if you have the network bandwidth on the clients you could run multiple jobs using different pipe files on the same client machine.
Enthusiast
JJcrum, thanks for the suggestion. I understand that multiple objects can be separated via multiple jobs across multiple unix boxes. I have been doing that quite a while. The difficulty comes to very large objects (> 10TB no fallback). If we think about how TARA is working today: multiple Arcmain process with a single control session, leveraging the network bandwidth across multiple nodes, that is the key to the parallelism and improved throughput. As a matter of fact, restore can be run in the same way. Teradata did not reveal how to launch such arcmain from command line interface. i believe TARA is doing it by passing specific script/command line structure to arcmain and have such capability... someone from Teradata can enlight us how to launch arcmain in "spawn child" mode?
Enthusiast
Emilwu, You make a good point. I'm not sure how TARA is doing it now. The way I used to get around these large tables is to break them up into multiple cluster backups. Then after restoring the multiple pieces I'd put them back to gether with a bteq script.
Enthusiast
yup.. cluster backup is the way to go in such case.. I am trying to use all-amp backup, which could take advantage of the amp-local archive features available today to save some By-net traffic and alleviate the impact on the system. And usually, we only do such operation to migrate date from prod to dev/qa. Hog up bynet is no fun when system was already 95% busy... :)
This is good start for using named pipes and arcmain.

While allocating space in target system, make sure the skewness in the source-database-space is taken care. If target system has more number of amps than the source system, the skew will be multiplied by that parameter. So need to allocate more space as such.
Oh yeah we can do with the pipe archieve, it's good to no one more conversion of DB. And another one is with SQL Server, but for that you need to foolow certain steps.

POS systems
Its always better to have more options so that the task get accomplished.

vpn
A few years back in (2008 I guess) , during a database upgraded I landed into the same situation (less disk space, remote databases...).
So created the following script snippets to perform a Piped ARCH/TEST

Sample to Arch/Restore at a Database Level :
======================================================
#Generic Variables
pipename=CAMPFIFO

#Archive Parameters
arch_session_count=6
arch_logon_str=".logon k2db/dwjxp01,pass;"

#Restore Parameters
rest_session_count=13
rest_logon_str=".logon a7db/dwtst01,pass;"

#Misc Variables

databasename=DWCAMPDB0P

archive ()
{
arcmain sessions=${arch_session_count} <<EOD
${arch_logon_str}
ARCHIVE DATA TABLES
(${databasename}) ALL
,INDEXES
,RELEASE LOCK
,FILE=${pipename};
LOGOFF;
EOD
}

restore ()
{
arcmain << EOD
${rest_logon_str}
COPY DATA TABLE
(${databasename}) (FROM (${databasename}) ,NO JOURNAL, NO FALLBACK)
,RELEASE LOCK
,FILE=${pipename};
LOGOFF;
EOD
}

analyze ()
{
arcmain << EOD
ANALYZE ALL
,DISPLAY LONG
,FILE=${pipename};
LOGOFF;
EOD
}

#--------------------------------------

rm ${pipename}
mkfifo ${pipename}

echo "INITIATED ARCHIVE PROCESS"
archive &
echo "WAIT FOR A FEW SECONDS BEFORE INTIATING THE RESTORE PROCESS"
sleep 10
echo "INITIATED RESTORE PROCESS"
restore
#analyze

Sample Functions to Arch/Restore a fixed set of Tables :
================================================================

#Generic Variables
pipename=EVALFIFO

#Archive Parameters
arch_session_count=6
arch_logon_str=".logon k2db/dwGXS01,pass;"

#Restore Parameters
rest_session_count=2
rest_logon_str=".logon a7db/dwtst01,pass;"

#Misc Variables

databasename=DWEVALDB03

archive ()
{
arcmain sessions=${arch_session_count} <<EOD
${arch_logon_str}
ARCHIVE DATA TABLES
(DWEVALDB0P.SIMLT_STAT),
(DWEVALDB0P.ACCT_XREF),
(DWEVALDB0P.ACCT_STAT),
(DWEVALDB0P.LGL_ENTY_STAT),
(DWEVALDIVP.DUN_CUST),
(DWEVALDB0P.SVC_ORD_RQST_XREF),
(DWEVALDB0P.CR_ACCT_DTL),
(DWEVALDB0P.EVAL_PRCSS),
(DWEVALDB0P.EVAL),
(DWEVALDB0P.SVC_ORD_CR_CASE),
(DWEVALDB0P.STAT_LOG),
(DWEVALDB0P.SVC_ORD_CR_CASE_ACCT),
(DWEVALDB0P.SVC_ORD_CR_CASE_AGNG),
(DWEVALDB0P.SVC_ORD_CR_CASE_AUDT_HIST),
(DWEVALDB0P.SVC_ORD_PAST_DUE_CNTCT_LIST),
(DWEVALDB0P.SVC_ORD_CR_CASE_ACCT_ACSS),
(DWEVALDB0P.TRNS_LOG),
(DWEVALDIVP.ACCT_STAT),
(DWEVALDIVP.LGL_ENTY_STAT)
,INDEXES
,RELEASE LOCK
,FILE=${pipename};
LOGOFF;
EOD
}

restore ()
{
arcmain sessions=${rest_session_count} << EOD
${rest_logon_str}
COPY DATA TABLE
(DWEVALDB0P.SIMLT_STAT) (FROM (DWEVALDB0P.SIMLT_STAT) ,NO JOURNAL, NO FALLBACK),
(DWEVALDB0P.ACCT_XREF) (FROM (DWEVALDB0P.ACCT_XREF) ,NO JOURNAL, NO FALLBACK),
(DWEVALDB0P.ACCT_STAT) (FROM (DWEVALDB0P.ACCT_STAT) ,NO JOURNAL, NO FALLBACK),
(DWEVALDB0P.LGL_ENTY_STAT) (FROM (DWEVALDB0P.LGL_ENTY_STAT) ,NO JOURNAL, NO FALLBACK),
(DWEVALDIVP.DUN_CUST) (FROM (DWEVALDIVP.DUN_CUST) ,NO JOURNAL, NO FALLBACK),
(DWEVALDB0P.SVC_ORD_RQST_XREF) (FROM (DWEVALDB0P.SVC_ORD_RQST_XREF) ,NO JOURNAL, NO FALLBACK),
(DWEVALDB0P.CR_ACCT_DTL) (FROM (DWEVALDB0P.CR_ACCT_DTL) ,NO JOURNAL, NO FALLBACK),
(DWEVALDB0P.EVAL_PRCSS) (FROM (DWEVALDB0P.EVAL_PRCSS) ,NO JOURNAL, NO FALLBACK),
(DWEVALDB0P.EVAL) (FROM (DWEVALDB0P.EVAL) ,NO JOURNAL, NO FALLBACK),
(DWEVALDB0P.SVC_ORD_CR_CASE) (FROM (DWEVALDB0P.SVC_ORD_CR_CASE) ,NO JOURNAL, NO FALLBACK),
(DWEVALDB0P.STAT_LOG) (FROM (DWEVALDB0P.STAT_LOG) ,NO JOURNAL, NO FALLBACK),
(DWEVALDB0P.SVC_ORD_CR_CASE_ACCT) (FROM (DWEVALDB0P.SVC_ORD_CR_CASE_ACCT) ,NO JOURNAL, NO FALLBACK),
(DWEVALDB0P.SVC_ORD_CR_CASE_AGNG) (FROM (DWEVALDB0P.SVC_ORD_CR_CASE_AGNG) ,NO JOURNAL, NO FALLBACK),
(DWEVALDB0P.SVC_ORD_CR_CASE_AUDT_HIST) (FROM (DWEVALDB0P.SVC_ORD_CR_CASE_AUDT_HIST) ,NO JOURNAL, NO FALLBACK),
(DWEVALDB0P.SVC_ORD_PAST_DUE_CNTCT_LIST) (FROM (DWEVALDB0P.SVC_ORD_PAST_DUE_CNTCT_LIST) ,NO JOURNAL, NO FALLBACK),
(DWEVALDB0P.SVC_ORD_CR_CASE_ACCT_ACSS) (FROM (DWEVALDB0P.SVC_ORD_CR_CASE_ACCT_ACSS) ,NO JOURNAL, NO FALLBACK),
(DWEVALDB0P.TRNS_LOG) (FROM (DWEVALDB0P.TRNS_LOG) ,NO JOURNAL, NO FALLBACK),
(DWEVALDIVP.ACCT_STAT) (FROM (DWEVALDIVP.ACCT_STAT) ,NO JOURNAL, NO FALLBACK),
(DWEVALDIVP.LGL_ENTY_STAT) (FROM (DWEVALDIVP.LGL_ENTY_STAT) ,NO JOURNAL, NO FALLBACK)
,RELEASE LOCK
,FILE=${pipename};
LOGOFF;
EOD
}

analyze ()
{
arcmain << EOD
ANALYZE ALL
,DISPLAY LONG
,FILE=${pipename};
LOGOFF;
EOD
}

#--------------------------------------

rm ${pipename}
mkfifo ${pipename}

echo "INITIATED ARCHIVE PROCESS"
archive &
echo "WAIT FOR A FEW SECONDS BEFORE INTIATING THE RESTORE PROCESS"
sleep 20
echo "INITIATED RESTORE PROCESS"
restore
#analyze

Don't forget, due to an ARCMAIN limitation, the pipe name needs to be all uppercase as well as 8 characters! We've also been using ARCMAIN from the command line, to disk, TSM and pipes.
Teradata Employee
Are there any other limitations to this approach?
I mean

1. Table or database size restrictions with respect to throughput and performance ?
2. Can a desktop user using WINDOWS client use this method to move data across multiple database environments?
3. What are the considerations a user should make when choosing to use Windows based client for this approach of moving data across multiple database environments?
Great share!
jim
Teradata Employee
The configuration is 2 Teradata Databases (2 physical machines comprised ot 1 to many nodes) that are accessible to each other via a network. Does that answer your question?
Enthusiast
@sendhil : Thanks!
Junior Supporter

Hi Jim,

I didn't submit it from command line as you mentioned. Instead , i created a small script (with the same instruction as yours)and then submitted. This was for both, the archive part and restore.

Few questions :

1. The step that you mentioned in which the source DB archive waits till you start the target restore. I didn't happen for me. The archive ran fully and finished first. The ARCPIPE was of size 0 though !!

2. Then i submitted the Resrote job. The restore also ran fine and data was transferred.

My question is : did it really use the pipe to transfer data ?

3. When i submit both scripts through command line as you mentioned, then the archive scripts sits and wait for the restore to start. Now, it seems that it uses the pipe

As mentioned by kevin above

"Don't forget, due to an ARCMAIN limitation, the pipe name needs to be all uppercase as well as 8 characters! We've also been using ARCMAIN from the command line, to disk, TSM and pipes." - Does it hold true ?

Thanks !

Junior Supporter

Genrally how much is speed of arcmain ? I am trying to move 300 GB data, but it is running for more than 2 hours !

The archive is logged on to 4 sessions and restore also is logged on to 4 sessions

Junior Supporter

Hi,

There is .RLG file created when do we an archive and restore. Is there a method to find how much can be size of the file depending on the table size being archived.

1. If i have to archive a table say 700 GB to 1 TB, what size of file should one expect ? The reason why i am asking this is because, my unix installation has a limit of max file size after which we need to intimate them or the server goes for a toss.

2. Is there a way to disable creation of this file, if i am archiving huge tables ?

-thx

Junior Supporter

-Anybody ??

jim
Teradata Employee

Hi,

I am not an ArcMain expert which is part of the reason I wrote this article. The article was meant for the novice who occasionally needs to move some data and can follow the steps outlined. When the steps are followed in order, the outcome so far has been successful. Most of your questions are ArcMain specific, and I just don't have that kind of expertice. But I will give you some insight into some of your other questions.

You are using 4 sessions, you need to increase them. Each session sends and then waits for a response. With multiple sessions it can send, send, send, etc and then by the time the nth send is on it's way, the oldest has replied so it can send again on that session. Because you only have 4 sessions, you are most likely seeing some delay because of the waiting. I can't tell you how many to add as I don't know your situation specifics. If this is something you do all the time, start doubling the sessions until you don't see any increase and continue forward with that.  

As far as the 8 Character Uppercase Limit, I really haven't deviated from the norm, so I can't answer. 

The RLG file is a Restart Log File and only comes into play if you need to restart. I have never checked the size of that file, but also have never had an issue with it getting too big. To my understanding it only logs information that would be needed in the event of a restart to give ArcMain enough information to pick up where it left off. So I can't imagine it being that big, but again, not my field of expertise.

I am sorry I can't be of more help, but your questions are more ArcMain specific and I am more of an ArcMain user.

Very useful article for a novice in ARCMAIN.

When I used RESTORE DATA TABLES command on a table 'TAB' I faced an error like below. Hence I went ahead and use COPY DATA TABLES command.

09/10/2013 20:00:38  *** Error ARC1202:Database id for

                     "ADW"."TAB"  doesn't match DBS.

09/10/2013 20:00:39  "ADW"."TAB" - TABLE SKIPPED.

09/10/2013 20:00:39  "ADW"."TAB" - LOCK RELEASED

Enthusiast

@vineeta.v

This is due to your are using Restore which expect same objetc with same id in the destination.

you can use Copy instead:

COPY DATA TABLES (DB1.tab1) (from(DB1.tab1)), abort, release lock, file=ARCPIPE;

Enthusiast

On the Destination system i've allocated session = #of amps+3; however only ONE active seession and the other sessions is idle.

Any ideas how to make the opertation faster by involving more slave sesison to action !