Executing Partial Table Copies with Teradata Data Mover

Tools
Tools covers the tools and utilities you use to work with Teradata and its supporting ecosystem. You'll find information on everything from the Teradata Eclipse plug-in to load/extract tools.
Teradata Employee

Executing Partial Table Copies with Teradata Data Mover

In the last Teradata Data Mover (TDM) article (Introduction to Teradata Data Mover: Create your first job), we discussed creating and executing a TDM job to copy a full table between Teradata systems. This use case is very common in the field when customers want to initially populate the target Teradata system with the same table that exists on the source Teradata system. Customers will not want to copy the entire table to the target system every time changes are made to the source system though. Tables on production systems can get quite large and it doesn't make sense to copy the entire table when only a subset of rows have been changed since the last copy took place. This is why TDM supports executing partial table copies as well as full table copies.

This article will discuss how customers typically use TDM to copy recent source table changes to the target system by executing partial table copies.

Determine What Changes Need To Be Copied

TDM only copies objects when told to do so by the user (either through the TDM command-line interface or the Viewpoint portlet). TDM does NOT automatically detect when changes have been made to the source system and then copy those changes to the target system. As a result of this, users must tell TDM what part of the table needs to be copied if they don't want the entire table copied in the job. Determining what changes have been made to a table can be complicated, but most customers accomplish this by maintaining a date/timestamp or batch id column in their source tables to track when changes have occurred.

Create a Partial Copy TDM Job

Let's use the same jg185041.items1 table we created in the Introduction to Teradata Data Mover: Create your first job article to create a partial copy TDM job. In this example, let's assume we want to copy all rows that correspond to when sales were made after 2009. This of course assumes that the date values in the table reflect when the row was updated and the sale was made. Here is the XML file we will use to create a TDM job that will copy all rows corresponding to sales made after 2009:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<dmCreate xmlns="http://schemas.teradata.com/dataMover/v2009"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://schemas.teradata.com/dataMover/v2009/DataMover.xsd">
<job_name>jg_partialcopy_devX</job_name>
<source_tdpid>dmdev</source_tdpid>
<source_user>jg185041</source_user>
<source_password>jg185041</source_password>
<target_tdpid>dmsmp</target_tdpid>
<target_user>jg185041</target_user>
<target_password>jg185041</target_password>
<log_level>1</log_level>
<database selection="unselected">
<name>jg185041</name>
<table selection="included">
<name>items1</name>
<sql_where_clause><![CDATA[ WHERE salesdate > '2009-12-31']]></sql_where_clause>
<key_columns>
<key_column>deptId</key_column>
<key_column>prodId</key_column>
<key_column>salesdate</key_column>
</key_columns>
</table>
</database>
</dmCreate>

As you can see in the XML file above, the <sql_where_clause> and <key_column> tags are used to tell TDM that this is a partial table copy job. TDM will pass the where clause statement as-is to the underlying utility used to extract the data and then it will transfer only those extracted rows to the target table. TDM will also create the target table if it doesn't already exist prior to copying the data. If the target table already exists and has rows in it then TDM will copy the data to a staging table first before using merge or insert-select to get the data from the staging table to the target table. The key columns specified in the job will be used in the merge or insert-select statement to uniquely identify the rows that need to be updated in the target table. Specifying the proper key columns is required so that TDM only updates the appropriate rows in the target table.

Here is the status output of executing this job:

dm-agent4:/home/jg185041/datamover/testing # datamove status -job_name jg_partialcopy_devX -output_level 3
Data Mover Command Line 13.10.00.03
Status mode
Command parameters:
- job_name: jg_partialcopy_devX
- output_level: 3
Permission authorized
Requesting status...
Connected to Daemon version 13.10.00.03

Job Name: jg_partialcopy_devX
Job Execution Name: jg_partialcopy_devX-20111228233504EST

TYPE ID STATUS CURRENT STEP START TIME DURATION TIME
-----------------------------------------------------------------------------------------------------------------------------------------------------
Job: 25 COMPLETED_SUCCESSFULLY 4 12/28/11 11:35 PM 0:0:26

TYPE ID STATUS STEP TYPE START TIME DURATION TIME
-----------------------------------------------------------------------------------------------------------------------------------------------------
Step: 3 COMPLETED_SUCCESSFULLY MOVE_TABLE_DATA 12/28/11 11:35 PM 0:0:24
Step: 4 COMPLETED_SUCCESSFULLY RESOLVE_TABLE_AFTER_LOAD 12/28/11 11:35 PM 0:0:1

TYPE ID NAME MOVE PHASE STATUS TYPE ROWS BYTES AGENT UTILITY TIMESTAMP
-----------------------------------------------------------------------------------------------------------------------------------------------------
Task: 221 "jg185041"."items1" MOVING_DATA ARCHIVE table 19 2624 Agent1 ARC 12/28/11 11:35 PM
Task: 221 "jg185041"."items1" MOVING_DATA COMPLETE table 19 2624 Agent1 ARC 12/28/11 11:35 PM
Task: 221 "jg185041"."items1_b28cc4_t" MOVING_DATA COPY table 19 2624 Agent1 ARC 12/28/11 11:35 PM
Task: 222 POST_DATA_MOVE COMPLETE Agent1 SQL 12/28/11 11:35 PM

You can see in the status output that only 19 rows were copied in this job instead of the full 200 in the source table.  You can also see that ARC was used to execute this partial table copy. This might seem strange, but ARC can actually be used to execute partial table copies when PPI tables are specified in the TDM job. TDM will use TPTAPI or JDBC to execute partial table copies in all other cases though.

The next TDM article will discuss how to execute dynamic partial table copies with TDM.

7 REPLIES
Enthusiast

Re: Executing Partial Table Copies with Teradata Data Mover

Hi jgordon,

I am having 2 issues with partial copy can you help me out

Is it possible to copy a table partially using ARC when it's having a Identity Column.

And my second question would be my partial job is failing the Row Count Validation. As it was comparing the row count on target side with where caluse on target sides control date can't it check the source control date.

Any help would be appreciated. Thanks in Advance 

Teradata Employee

Re: Executing Partial Table Copies with Teradata Data Mover

Hi gannu,

Yes, you should be able to do a partial table copy with ARC as long as the table being copied is a PPI table.

The partial row count validation will compare the results of executing a "select count(*)" on the source and target tables using the where clause provided in the job definition.  If the where clause contains a nested table that isn't the same on the source and target (assuming the control date is in a different table than the one being copied) then the validation could fail.  Copying the nested table specified in the where clause to the target system might resolve this issue.

Please refer to the Data Mover User Guide for more information on both of these topics if necessary.

Re: Executing Partial Table Copies with Teradata Data Mover

Hi JGordon,

Is it possible to do a partial table copy and use a where clause that compare a field of the source system with result of a subquery using a operator like '>', '<' or '=' ?

I mean:

<sql_where_clause><![CDATA[where load_timestamp > (select last_run_ts from dm_control where job_name = 'xxx')]]></sql_where_clause>

I am getting 0 rows copied when I do this comparison. When actually it's supposed to copy around 68 million rows.

When I change this operator to NOT IN, works fine.

This could be a behavior of datamover version 13.10? It was changed on the newer versions?

Thank you in advance and best regards,

Teradata Employee

Re: Executing Partial Table Copies with Teradata Data Mover

It seems like you're hitting the DBS bug covered in DR 156160.  This bug happens in the DBS when using export without spool with certain types of nested select statements.  Using export without spool is the default for Data Mover TPTAPI jobs because it increases performance.  You should upgrade the DBS to a version that has the fix for DR 156160 to resolve this issue.  If upgrading the DBS isn't an option then you can set <export_without_spool> to false in the Data Mover job for all tables that encounter this problem as a workaround.

Re: Executing Partial Table Copies with Teradata Data Mover

Hello, Is it possible to move partial table with a where clause but without keycolumns?

I tried it and here is the error I get,

Error: Invalid content was found from user input XML file. Details: javax.xml.bind.UnmarshalException

 - with linked exception:

[org.xml.sax.SAXParseException; lineNumber: 28; columnNumber: 17; cvc-complex-type.2.4.b: The content of element 'key_columns' is not complete. One of '{"http://schemas.teradata.com/dataMover/v2009":key_column}' is expected.]

Any comments would be appreciated.

Teradata Employee

Re: Executing Partial Table Copies with Teradata Data Mover

No, you must specify at least one key column in a partial copy job.

Re: Executing Partial Table Copies with Teradata Data Mover

Thank you Jason!