Introducing TDM into a production Dual System environment

The UDA channel is for Teradata’s Unified Data Architecture including the Analytical Ecosystem and other UDA influences. This channel provides information specific to the integration and co-existence of multiple systems, in particular when a mix of Aster, Teradata, and Hadoop are present. It is also meant to support information around the UDA enabling technologies so products like Viewpoint, Data Mover, Connectors, QueryGrid, etc.
Teradata Employee

Introducing TDM into a production Dual System environment

Teradata Data Mover can be a valuable product to add into a dual system production environment either as a scheduled / triggered step in your load process or for ad-hoc re-synchronization of tables.

Teradata Data Mover introduction

With Teradata Data Mover, jobs can be defined to copy specified database objects, such as tables and statistics, from one Teradata Database system to another.  In addition, Teradata Data Mover can copy complete databases.

In addition to tables and statistics, the following database objects associated with tables can be copied with the tables, but not by themselves:

  • hash indexes
  • join indexes
  • journals
  • triggers

Dual-system support with active replication between two systems is provided through Teradata Data Mover's command-line interface, which allows the user to:

  • Enable regular full or partial table copies as part of a dual-system synchronization solution
  • Recover databases or tables in response to unexpected system failures or other data loss
  • Copy partial production data to a QA system to provide necessary data for accurate pre-production testing

Architecture considerations

The DMAgent can run on the server with the DMDaemon and the DMCommandLine. However you may achieve better performance by running the DMAgent on a second server.  Remember that all data transferred is handled by the DMAgent and the system it resides on.

As you would expect, using multiple DMAgents can result in increased throughput, but remember you cannot run more than one DMAgent on a single server.  Plus the actual number of requested concurrent tasks and the agent.maxConcurrentJobs setting will affect the ability to effectively use multiple DMAgents.

The network configuration is critical for good performance of large data copying.  Two areas of network configuration are -

(a) the network switch used should not be a shared port switch.  Sharing switch resources can have drastic effects on the effective bandwidth available at any given time based on the other systems that are also using that switch

(b) additional performance gains can be achieved by using multiple network connections between the Teradata Data Mover server(s) and each of the Teradata database system, such that one set of connections supports traffic to the primary Teradata system and the other set supports traffic to the secondary Teradata system.

Monitoring TDM components

The Teradata Data Mover components include not only hardware components but also processes should be monitored.  Teradata Data Mover has two daemon processes, DMDaemon and DMAgent that manage and control Data Mover jobs.  These daemon processes are expected to be continuous running processes and should be registered in TMSM as components.  On the Teradata Data Mover server(s), these processes, DMDaemon and DMAgent, are typically configured as services that begin executing when the server is booted and stop executing when the server is shut down.

The following monitor script is an example of what you can place on each Teradata Data Mover server, and can be called by cron every few minutes -  

if [ $system == "TDMprimary" ]
    then ECO="Region1"
    else ECO="Region2"
SENDEVENT="/usr/bin/sendevent -e $ECO -t Heartbeat -s $system"
# Teradata Data Mover daemon health check section
$DMroot/commandline/datamove list_agents > /tmp/DMcheck_$$.log 2>&1
daemon=`grep -c "Connected to Daemon" /tmp/DMcheck_$$.log`
if [ $daemon -eq 1 ]
then # daemon is running
    $SENDEVENT --et HCHK -r $resource -m "DMdaemon is running"
# Teradata Data Mover agents health check section
agentname=`grep $DMroot/agent/|cut -d'=' -f2`
agent=`grep -c "$agentname found to be communicating" /tmp/DMcheck_$$.log`
if [ $agent -eq 1 ]
then # an agent is active
    $SENDEVENT --et HCHK -r $resource -m "DMagent is running"
rm -f /tmp/DMcheck_$$.log

Since a process can be seen in the process stack but be non-responsive, the above script method for monitoring is preferred since it is a high-level acknowledgement that the process is alive on the server.  For this reason, TDM is used to verify that the TDM processes are alive and functioning.

This article will not cover the details of setting any desired TMSM thresholds or actions, but the Teradata Data Mover components should be monitored, with appropriate thresholds and alert actions for a critical production sub-system.

Static vs dynamic jobs

There are two approaches for defining and using Data Mover jobs:

  • Static jobs that are pre-defined for copying specific database objects from a specific source system to a specific target system.
  • Dynamic jobs that are built on an as-needed basis and can target any database object from any appropriate source system to any appropriate target system.

Each approach has its benefits and drawbacks. 

Static approach drawbacks: requires the objects, source and target systems, the direction of copy and the type of copy (full verses partial) to be pre-defined. 

Static approach advantages:  a job can include a set of tables that may be copied in parallel, and the job startup time is minimal. 

Dynamic approach drawbacks: each job handles just one table and the job startup involves more steps, so it takes more time (this extra overhead time may, or may not, be significant for any specific customer environment or SLA needs). 

Dynamic approach advantages: highly flexible and provides for easily creating jobs as needed in a production cycle – as the table load order changes (as tables are, or are not, loaded) the dynamic approach does not require any special logic to handle these changes.

We will look at both approaches.

Triggering TDM jobs

By schedule, such as from an ETL script at the successful completion of a load job, or directly from an enterprise scheduler following the successful load of a table, or trigged from TMSM when a load job is recognized as completing.

As needed, on an ad-hoc basis or from TMSM as part of a table sync/re-sync action following the detection of an out-of-sync condition for a pair of tables.

Teradata Data Mover table validation

Teradata Data Mover UOWhealthAmtvalue may not match the ETL UOWhealthAmt value from the source load due to any deletes executed as part of the load and, if ARC is used, ARC moving complete partition or table.  Because of this, table validation may be desired for every Teradata Data Mover job.

You may wish to validate tables using a custom method that does a checksum on all but LOBs and timestamp fields of a table, and if it is a partial table copy that was done, the records included could be just those for the current partition or the last 30 days, or whatever seems appropriate.

Since it is desirable to do table validation for most table copies, you may wish to do it for every copy and execute the table validation at the successful completion of every job.

The dynamic table copy script calls the table validation script for the table copied against both the source and the target database systems.

Both and scripts are provided as examples at the end of this article.

Monitor TDM jobs

Of course you want to be actively monitoring any Teradata Data Mover jobs regardless of how they might have been triggered.  Teradata Data Mover will send TMSM events if configured to do so and this is controlled by the value of the tmsm.modekey – the default value is NONE and should be set to ONLY_REAL_TMSM.  Additionally, to more easily identify which TMSM event is associated with which trigger action, you can pass a Unit-Of-Work id (UOWid) to the datamover command so all events generated by the datamover job will match up with the external UOWid.  This is accomplished by either specifying -uowid $UOW as a parameter to the start, re-start or move command,  or adding a <uowid>$UOW</uowid>as the penultimate line of the xml file used for the job definition (assuming that you have set UOW to be the desired UOWid to use).  If using the script example, the UOWidis a required parameter and it is passed to datamover for you.

Observations and best practices

  • Teradata Data Mover cannot handle deleted records, so any load process that includes the possibility of deleting rows is not a good candidate for Teradata Data Mover unless full table copy mode is used.

    Note: For a partial table copy, if data has been deleted from the source system removed rows will still exist in the target.
  • When using Data Mover’s Command Line Interface (CLI) always invoke commands using a single user id, such as the dmuser, to avoid log file write permissions. CLI always uses the current user and associated permissions to write to the dmCommandLine.log.  Set the umask for dmuser to 002 to allow other members of the same group (defaults to users) to also write to the dmCommandLine.log, and public to be able to read it.
  • Create multiple dedicated Teradata user accounts to facilitate multiple ARC jobs when running concurrent Data Mover jobs.  Each Teradata Data Mover job using ARC requires sole userid access (no other session for this userid can exist while the job runs. Teradata Data Mover will default to ARC when possible.  The sample script utilizes userids from a dedicated pool and <use_userid_pool>true</use_userid_pool>is set so that Teradata Data Mover will manage the pool of users itself (using each available id as needed and waiting for one to become available when they are all in use).  See section of configuration.xmlbelow:

                <system name="ASTRO">
            <description>Purpose: Use a target user from the pool of users.
            This enables running multiple arc tasks at the same time</description>

  • Define a Teradata Data Mover job naming convention that identifies the direction of flow (e.g. TBL_A-Primary_Secondary).
  • When creating static jobs, for each job created, create a second job that reverses the flow to allow for the situation where a failover to the Secondary system is necessary, and loads are re-directed to the Secondary system (e.g. TBL_A-Primary_Secondaryand TBL_A-Secondary_Primary).
  • When doing partial copies on a PPI table, either with static or dynamic jobs, ARC can be used for faster copies when the constraint field is also the partitioning field (such as a ActivityDate)

Example Scripts

Note: While these scripts have been verified, they are provided “as is” with no support of any type.

# template for dynamically creating TDM cli jobs
# assumes user id dmuser exists and has log and xml directory
# with tdm_template.xml file in the xml directory
if [ $# -lt 6 ]
    echo "Usage: $0 Source_system Target_system Database Table (F|P) UOW"
    exit 1
cmd="$0 $@"
streams=4  # this should be increased if table size is greater than 50Gb
Tsess=`expr $Ssess \* 2`
rm -f $LOGDIR/$job-*.log  # cleanup from any prior successful run
LOGFILE=$LOGDIR/$job-`date +"%Y%m%d%H%M"`.log
ERRFILE=$LOGDIR/$job-`date +"%Y%m%d%H%M"`.err
touch $LOGFILE
chmod 666 $LOGFILE
if [ $system == "dirac" ]
    then ECO="einstein"
    else ECO="innovation"
SENDEVENT="/usr/bin/sendevent –app DataMover -j $job -r $resource -t $type -e $ECO -s $system -w $UOW"

echo "source=$source" > $LOGFILE
echo "target=$target" >> $LOGFILE
echo "db=$db" >> $LOGFILE
echo "table=$table" >> $LOGFILE
echo "mode=$mode" >> $LOGFILE
echo "UOW=$UOW" >> $LOGFILE
echo "Job $job called `date`" >> $LOGFILE
rm -f $XML/$UOW.xml
cp $XML/tdm_template.xml $XML/$UOW.xml 2>>$LOGFILE
if [ $? -ne 0 ]
    msg="ERROR: $XML/tdm_template.xml does not exist or is not readable"
    echo $msg
    $SENDEVENT --et ALERT -l 10 -a 1012 -m "$msg"
    exit 1

chmod 666 $XML/$UOW.xml
echo "<job_name>$job</job_name>" >> $XML/$UOW.xml
echo "<source_tdpid>$source</source_tdpid>" >> $XML/$UOW.xml
echo "<source_user>$source_user</source_user>" >> $XML/$UOW.xml
echo "<source_password>$source_pw</source_password>" >> $XML/$UOW.xml
echo "<target_tdpid>$target</target_tdpid>" >> $XML/$UOW.xml
echo "<target_user></target_user>" >> $XML/$UOW.xml
echo "<target_password></target_password>" >> $XML/$UOW.xml
echo "<use_userid_pool>true</use_userid_pool>" >> $XML/$UOW.xml
echo "<data_streams>$streams</data_streams>" >> $XML/$UOW.xml
echo "<source_sessions>$Ssess</source_sessions>" >> $XML/$UOW.xml
echo "<target_sessions>$Tsess</target_sessions>" >> $XML/$UOW.xml
echo "<overwrite_existing_objects>true</overwrite_existing_objects>" >> $XML/UOW.xml
echo "<log_level>1</log_level>" >> $XML/$UOW.xml
echo "<online_archive>false</online_archive>" >> $XML/$UOW.xml
echo "<database selection=\"unselected\">" >> $XML/$UOW.xml
echo "<name>$db</name>" >> $XML/$UOW.xml
echo "<table selection=\"included\">" >> $XML/$UOW.xml
echo "<name>$table</name>" >> $XML/$UOW.xml
echo "<compare_ddl>false</compare_ddl>" >> $XML/$UOW.xml
if [ "$mode" == "P" ]
then # this section assumes a common timestamp field is available for last modified date/time
    echo ".logon $target/$Suser,$Spw" > $XML/$table.btq
    echo ".set WIDTH 128" >> $XML/$table.btq
    echo "select max(last_update_dttm) from $db.$table;" >> $XML/$table.btq
    echo ".quit" >> $XML/$table.btq
    bteq < $XML/$table.btq > $XML/$table.ts
    TS="`tail -7 $XML/$table.ts|head -1`"
    TS=`echo $TS|sed "s/ //g"`  # strips any spaces
    first=`echo "$TS"|cut -c1`
    if [ "$first" == "2" ]
    then # first character matches expected year start, so prepare partial copy code
        dt=`echo $TS|cut -c1-10`
        tm=`echo $TS|cut -c11-`
        TS=$dt" "$tm
        echo ".logon $source/$Suser,$Spw" > $XML/$table.btq
        echo ".set WIDTH 128" >> $XML/$table.btq
        echo "show table $db.$table;" >> $XML/$table.btq
        echo ".quit" >> $XML/$table.btq
        bteq < $XML/$table.btq > $XML/$table.pi
        # get first unique index definition
        UI=`grep "UNIQUE" $XML/$table.pi|head -1`
        if [ -z "$UI" ]
        then  # if no unique index defined, get all column names
            echo ".logon $source/$Suser,$Spw" > $XML/$table.btq
            echo ".set WIDTH 128" >> $XML/$table.btq
            echo "select 'COL:'as \"col\",ColumnName from dbc.\"columns\" where DatabaseName = '${db}' and TableName = '${table}';" >> $XML/$table.btq
            echo ".quit" >> $XML/$table.btq
            bteq < $XML/$table.btq > $XML/$table.col
            PI=`grep ^COL: $XML/$table.col|grep -v LAST_UPDATE_DTTM|sed "s/COL://"`
            PI=`echo $UI|cut -d'(' -f2|cut -d')' -f1|sed "s/,//g"`
        echo "<sql_where_clause>" >> $XML/$UOW.xml
        echo "<![CDATA[" >> $XML/$UOW.xml
        echo "WHERE Last_Update_Dttm > '$TS' ]]>" >> $XML/$UOW.xml
        echo "</sql_where_clause>" >> $XML/$UOW.xml
        echo "<key_columns>" >> $XML/$UOW.xml
        for key in $PI
           echo "<key_column>$key</key_column>" >> $XML/$UOW.xml
        echo "</key_columns>" >> $XML/$UOW.xml
    rm -f $XML/$table.btq $XML/$table.ts $XML/$table.pi $XML/$table.col
echo "</table>" >> $XML/$UOW.xml
echo "</database>" >> $XML/$UOW.xml
echo "<uowid>$UOW</uowid>" >> $XML/$UOW.xml
echo "</dmCreate>" >> $XML/$UOW.xml
echo "cleaning up and removing any earlier version of job $job" >> $LOGFILE
$DM stop -job_name $job >> $LOGFILE 2>&1
$DM cleanup -job_name $job >> $LOGFILE 2>&1
$DM delete_job -job_name $job -all -skip_prompt >> $LOGFILE 2>&1
echo "creating tdm job $job" >> $LOGFILE
while true
    if [ "$utility" != "default" ]
        $DM create -force_utility $utility -f $XML/$UOW.xml >> $LOGFILE 2>&1
        $DM create -f $XML/$UOW.xml >> $LOGFILE 2>&1
    if [ $? -eq 0 ]
        err=`tail $LOGFILE|grep -c "Error: cannot connect to Daemon"`
        if [ $err -eq 1 -a $try -lt 3 ]
            try=`expr $try + 1`
            echo "Create failed: re-trying [$try]" >> $LOGFILE
            sleep 180
            msg="Job create error for $job - see $ERRFILE"
            echo $msg
            $SENDEVENT --et ALERT -l 10 -a 1012 -m "$msg"
            mv $LOGFILE $ERRFILE
            echo " " >> $ERRFILE
            echo "To re-run this job, execute the following command:" >> $ERRFILE
            echo "$cmd" >> $ERRFILE
            chmod 666 $ERRFILE
            rm -f $XML/$UOW.xml $LOGDIR/$UOW.status
            exit 1
while true
    $DM start -job_name $job -sync >> $LOGFILE
    if [ $? -eq 0 ]
        $DM status -job_name $job -output_level 3 > $LOGDIR/$UOW.status
        rows=`grep MOVING_DATA $LOGDIR/$UOW.status|tail -1|sed "s/  */ /g"|cut -d' ' -f7`
        bytes=`grep MOVING_DATA $LOGDIR/$UOW.status|tail -1|sed "s/  */ /g"|cut -d' ' -f8`
        agent=`grep MOVING_DATA $LOGDIR/$UOW.status|tail -1|sed "s/  */ /g"|cut -d' ' -f9`
        if [ $agent == "ARC" ]
            agent=`grep COPY $LOGDIR/$UOW.status|tail -1|sed "s/  */ /g"|cut -d' ' -f9`
            util=`grep MOVING_DATA $LOGDIR/$UOW.status|tail -1|sed "s/  */ /g"|cut -d' ' -f10`
        duration=`grep ^Job: $LOGDIR/$UOW.status|sed "s/  */ /g"|cut -d' ' -f8`
        sec=`echo $duration|cut -d':' -f3`
        min=`echo $duration|cut -d':' -f2`
        hr=`echo $duration|cut -d':' -f1`
        hsec=`expr $hr \* 3600`
        msec=`expr $min \* 60`
        secs=`expr $hsec + $msec`
        secs=`expr $secs + $sec`
        rate=`expr $bytes / $secs`
        rate=`expr $rate \* 3600`
        msg="Success | $job | rows=$rows | bytes=$bytes | duration=$duration | rate=$rate-bytes/hr | agent=$agent | utility=$util"
        echo $msg
        cat $LOGDIR/$UOW.status >> $LOGFILE
        $HOME/bin/ $db $table $source $user $pw >> $LOGFILE 2>&1
        $HOME/bin/ $db $table $target $user $pw >> $LOGFILE 2>&1
        $SENDEVENT --et END -v $rows -m "$msg"
        rm -f $XML/$UOW.xml $LOGDIR/$UOW.status $LOGDIR/$UOW.log
        exit 0
        sleep 60
        err=`tail $LOGFILE|grep -c "Daemon cannot find any Agents"`
        if [ $err -gt 0 -a $try -lt 2 ]
        then  # try again
            try=`expr $try + 1`
            echo "Unable to find Agent: re-trying [$try]" >> $LOGFILE
            UserErr=`$DM status -job_name $job -output_level 4|grep -c ARC0700`
            if [ $UserErr -gt 0 -a $try -lt 3 ]
                try=`expr $try + 1`
                echo "User already logged on: re-trying [$try]" >> $LOGFILE
                $DM cleanup -job_name $job > /dev/null
                sleep 180
                msg="Failed - see $ERRFILE"
                echo $msg
                $SENDEVENT --et ALERT -l 10 -a 1012 -m "$msg"
                mv $LOGFILE $ERRFILE
                echo " " >> $ERRFILE
                echo  "To get status details execute the following command:" >> $ERRFILE
                echo "$DM status -job_name $job -output_level 4" >> $ERRFILE
                echo " " >> $ERRFILE
                echo "To re-run this job, execute the following command:" >> $ERRFILE
                echo "$cmd" >> $ERRFILE
                chmod 666 $ERRFILE
                rm -f $XML/$UOW.xml $LOGDIR/$UOW.status
                exit 1


<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

<dmCreate xmlns="" xmlns:xsi="" xsi:schemaLocation="">

# File: - creates checksum on all fields in a table
#       except for LOB and timestamp fields
Usage="Usage: $0 DataBase TableName System User Pw"
if [ $# -ne 5 ]
    echo $Usage
    exit 1
bteq <<EOF > /tmp/$tab$DATE.columns
.logon $sys/$user,$pw
select 'COL:',columnname,columntype from dbc.columns
where databasename='$db' and tablename='$tab' and columntype not in ('CO','BO','TS');
grep ^COL: /tmp/$tab$DATE.columns|sed "s/  */ /"|cut -d' ' -f2 > /tmp/$tab$DATE.col
for col in `cat /tmp/$tab$DATE.col`
Columns=`echo $Columns|sed "s/,//"`
bteq <<EOF > /tmp/$tab$DATE.sum
.logon $sys/$user,$pw
select 'CHECK:',SUM(CAST(HASHBUCKET(HASHROW($Columns)) AS DECIMAL(38,0))) from $db.$tab;
CheckSum=`grep ^CHECK: /tmp/$tab$DATE.sum|sed "s/  */ /"|cut -d' ' -f2|cut -d'.' -f1`
echo $tab:$sys:$CheckSum
rm -f /tmp/$tab$DATE.*


Re: Introducing TDM into a production Dual System environment

Can TDM be integrated with TD Unity in Dual Active system and still achieve Data accuracy between servers?If this is possible, what all major checkpoints one should ensure?
Teradata Employee

Re: Introducing TDM into a production Dual System environment

Yes, TDM can be integrated with TD Unity in a Dual Active environment, but care must be taken to ensure TDM is working with and not against Unity. You can use TDM to copy full or partial tables to TD instances via Unity. TDM can also be used directly between two TD instances to synchronize tables that have been marked as Unrecoverable due to an outage lasting longer than the Unity recovery log could maintain history for. Never target an Active TD instance directly with TDM as this will result in unsychronized tables.

Re: Introducing TDM into a production Dual System environment

hi Paul,

We recently opened production flood gates for TDM jobs. I can say bcz of the too many TDM jobs system is getting hampered in terms of the performance and resources. I am thinking to apply a throttle to delay and restrict the TDM jobs. But as this is first tiime on our box, i would like to know your thoughts on imposing the throttle rules on TDM jobs.

Please share your thoughts.



Teradata Employee

Re: Introducing TDM into a production Dual System environment

If the production system is suffering due to the load imposed by the Data Mover jobs, then I suggest you implement a Workload Manager rule to reduce the priority of the Data Mover jobs.  This has to be balanced with the increased delay in getting the tables copied and the larger window of risk for the customer for critical tables.  There is no quick solution - you need to carefully establish where the resource shortage is by doing some sort of capacity management engageent so that you apply controls to the appropriate resource (Data Mover system, network, loader slots, database server cpu, memory or I/O.



Re: Introducing TDM into a production Dual System environment

Hi Paul, I would be careful using the checksum script above. Due to the high number of hash synonyms it can happen that different tables will create the same checksum.


database your_db;

create table checksum_test1
as (select * from sys_calendar.calendar ) with data
unique primary index(day_of_calendar)
on commit preserve rows;

create table checksum_test2
as (select * from sys_calendar.calendar ) with data
unique primary index(day_of_calendar)
on commit preserve rows;

update checksum_test2
set calendar_Date = '1990-11-29'
where calendar_date = '1990-11-30';

update checksum_test2
set calendar_Date = '2032-10-06'
where calendar_date = '2032-10-07';

select *
from checksum_test1
select *
from checksum_test2;

select *
from checksum_test2
select *
from checksum_test1;

The problem bothered me also some time and I found a solution which might overcome the issue. See