Phased Rollouts in Unity Director & Loader

Unity
Unity is Teradata’s data synchronization and workload routing tool providing Active – Active database availability delivering near real-time RTO/RPO
Teradata Employee

Phased Rollouts in Unity Director & Loader

Getting from here to there – A phased rollout of Unity Director & Loader

One of the main appeals of Unity Director and Loader is the simplicity that it brings to the overall architecture. It allows ETL workloads to stream two or more systems in parallel, while automatically managing failures on each system. However, while this end state is very simple and easy to understand, moving a large Data warehouse environment from a single system to a active-active system can be a challenging task. Without prior experience, it can be daunting to decide where best to start. Let’s look at the best way to approach this transition.

Before you begin

Before attempting to move to production, it’s essential that best practices and proper governance is in place. Moving to an active-active environment can pose a big conceptual change for an organization that requires operations and support people to be aware of the presence and role of a second active system.

Disciplined and methodical testing in a lower environment is a key element to success that should not be skipped for any reason. A test environment allows DBAs and operations to practice the actions they need to do during a production migration to build familiarity. A production push is not the right time to learn new things. The primary reason for UDL in an environment is typically to provide high availability, so disciplined and methodical testing should be a natural element.

Finally, a major asset to a production migration is having mature DataMover jobs and practice using them. During a migration to dual systems, DataMover is used to initially sync the tables between production systems so they can be activated in UDL. DataMover is also useful to support synchronization of a test environment prior to moving to production.

Making tables “Managed” in UDL

When a table is part of UDL’s configuration, it is said to be “managed”.  In the long term, tables and views are normally managed on all available Teradata systems. In special situations, they might be managed on a single system to achieve specific goals, but this is generally not recommended for new users, for reasons that are explained below.

The initial deployment of UDL into an existing production environment is an exception to this general rule. Initially, tables will be managed on the existing production system, and then later replicated and synced to the second system. To do this, a carefully planned out roll-out strategy is necessary.

Part 1 - Developing a Phased-Rollout Plan

It’s normal in a new production rollout of Unity Director and Loader (UDL) to desire to roll-out the environment in discrete phases. This best practice allows applications and workloads to transition in smaller manageable parts. This also provides a chance for people in operations to adapt to the new architecture before the entire environment is committed to it. This is preferable to attempting a “big-bang” approach. Attempting to transition the entire environment all at once poses added risk because of the time required to sync and validate the entire data warehouse. Also, the increased scope means the likelihood of finding an issue becomes greater, while also making the effort required to identify an issue more costly.

The challenge with a phased rollout is to find an order in which the parts of the environment can be divided so they can transition from living on a single active Teradata system to multiple active Teradata systems. Tables, views, workloads and applications views don’t exist on their own – they are all interdependent. If some of the tables or views that are needed by an application aren’t all transitioned to both systems, the application will fail when it tries to execute on both systems. UDL returns a special error in this case:

“4511 – A mismatch has occurred in this request between where the object(s) exist and where they need to exist for the transaction or session. Check the object(s) used and the systems they are managed on.”

This error means that the client application doesn’t have all the tables it needs on all the systems it is talking to. Picking the right order to transition parts of the environment will allow you to avoid this error. Even with careful preparation though, interdependencies might be overlooked, so prior to beginning a production migration, operations and database teams should be familiar with this error, understand its implications, and be ready to react to it by adjusting the placement of tables in UDL’s data dictionary.

Careful attention should be paid to any changes in the applications or database schema since the plan was made. If possible, the total duration of the migration should be timed to complete before the promotion of any other changes to the production environment that might impact the rollout plan.

Beware of “Knots”

Often there will be points in an environment that tie together otherwise unrelated parts. These “knots” can make it difficult or impossible to divide the rollout of each phase into smaller parts. Knots can take a few different forms:

    • Shared connection strings on application or ETL servers.
    • Shared user names that are used by multiple processes.
    • Shared meta-data tables that are used to control, audit or log applications and load processes.

Extra caution must be taken any time these items are present, since they join together different parts of the environment in ways that might not be immediately obvious.

Widely shared tables that are used by a variety of processes pose a special challenge in particular. This is because tables must either be managed on one system or on both. If a table is written to by several processes, some of which are restricted to work on a single system, while others are able to on two systems, there will be no way to keep the table synchronized. The processes restricted to a single system will start to fail if they cannot write to all copies of the table. Conversely, the processes working on two systems will potentially fail if the table is only managed on a single system. These situations are more likely to occur in OLTP style applications, rather than with traditional Data warehouse ETL and Reporting applications because of their more complex tactical workloads. The only work around is to treat the entire collection of processes that depend on the shared tables as a single group, and migrate them to two systems at the same time.

Shared tables and other objects that are read from are not an issue, as long as they are managed on all available systems. This applies to system objects like the user defined functions in SYSTDLIB and other shared read-only system functions. In general, any object that is “read-only” should be managed on both systems in the first phase of deployment.

Start from the bottom, go to the top

Most Data Warehouse environments consist of distinct layers. Staging tables are loaded, which flow in to base tables, which are then accessed by one or more layers of views in the semantic layer. This segmentation provides a natural dividing line for the transition.

    1. Staging: The first part of the environment to move should be the staging tables and the load jobs that populate them. This ensures that all the tables that are populated from the stage tables have the available data. Since most staging tables are also empty when not being loaded, there is usually minimal need to synchronize any data between the two systems before loads are shifted to load via UDL. Typically, only the staging table definitions need to be synced beforehand. Once load jobs have begun loading via UDL, they can be checked and validated to ensure they are functioning properly, before moving onto the next step. Groups of load jobs and staging tables might be further divided into distinct groups to further reduce the size of the phase. This assumes, of course, that the push-down operation (typically a merge statement) to populate the base tables is separate from the load job that populates the base table from the staging table. If the merge step is part of the same load job that populates the staging table, either the load job must be re-written to break out that step so it can be executed directly on the first Teradata system only, or the base tables might need to be transitioned at the same time as the staging tables.
    1. Base Tables: The next part of the environment to move should be the base tables that are populated from the staging tables. As mentioned, this usually means executing the merge statement via UDL.  Unlike staging tables, base tables usually have permanent data in them that must be synchronized using DataMover before they can be used. Once populated, the base tables can be validated to confirm they are being correctly populated and the data is in-sync on both systems, before proceeding to the next phase of the rollout. Of course, it might not always be possible to separate groups of base tables from groups of staging tables, because they are often populated as the last step of a load job. In those cases, both the base tables and staging tables should be treated as one group and moved together. Also, remember that if there are any clients that do online updates to the base tables (Eg, TPT Stream), these must also be migrated in the same group to ensure they are now writing to both copies of the tables. However, views that depend on the base tables and applications that read from them (but not write to them) can remain on the first system for the moment since, like tables, views can be managed on a specific system.
    1. The Semantic layer, which contains views built on top of the base tables, should be the last to transition. Since this layer typically contains only views, it does not require any data synchronization.
    1. Reporting Applications - Once all ETL jobs have been transitioned in and the tables and views are available on both systems to support reporting and other applications, they can start to transition into the UDL environment. Reporting client applications are typically easy to divide into phases, since they are primarily “read-only” and don’t typically have any interdependencies between themselves. Other more complex OLTP applications might need to be transitioned at the same time, but if the base tables they rely on are in place, this is typically not an issue.

Further dividing deployment groups

Even after a large EDW is divided into staging, base tables, semantic layer and reporting layers, these groups still might be very large. Dividing the staging lay further into smaller groups that can be deployed independently can make the rollout more manageable.

One might be tempted to divide the entire data warehouse into collections of tables that can be migrated as groups. However, this level of granularity usually can make the practical steps to migrate each group cumbersome and can easily overwhelm even experienced users with too much detail. Working at the database level provides a more manageable level of information and control. If the EDW already follows normal best practices and has tables divided into databases in a logical way, this approach should fit naturally.

Here’s one approach using DBQL data to group users and databases into independent groups:

1) First a list is extracted from the DBC.QRYLOG and DBQ.DBQLOGJTABL that identifies which users are writing to databases. This query excludes common systems users like DBC that might be used to write to all databases. It also filters out select statements, since we are not concerned about users that read from the databases, since reads can always be directed to a single system, and will do so automatically if the table they read from only exists on one system.

SELECT  count(*),trim(b.userName), trim(a.objectDatabaseName)
FROM dbc.dbqlobjtbl a INNER JOIN dbc.qrylog b ON a.queryid = b.queryid
AND a.procid = b.procid
AND a.ObjectType='Tab'
AND StatementType<>'Select'
AND objectTableName<>'TXNTABLE'
AND a.objectDatabaseName not in ('dbc','dbcmngr','sqlj','sys_calendar','sysadmin','syslib','syspartial','sysudtlib','viewpoint','tmsmviews','tmsmreaduser','tmsmji','tmsm','tmsmem','tdwm','td_sysfnlib','pdcrinfo','pdcrstg','pdcrdata','pdcradmin','sysspatial','systemfe','unitymgmt','tdstats')
group by b.userName,a.objectDatabaseName
ORDER BY b.userName,a.objectDatabaseName asc;

This produces a list of all the users that write to all the databases. If the list contains any unnecessary dependencies from users that don't necessarily need write access to databases, these should be elminated before the next step. This will help create more deployment groups, making them smaller and more manageable. The count provided indicates how frequently each user wrote to the database. Users that write infequently could be candiates for elmination.

432431,ETLUSER,ETLUSER
28132,ETLUSER2,ETLUSER
10253,ETLUSER3,ETLUSER
4321,ETLUSER4,ETLUSER4
3141,ETLUSER5,ETLUSER4
3012,ETLUSER6,ETLUSER4
34533,BASE1,BASE1
23421,BASE2,BASE3
4234,BASE2,BASE1
9341,BASE4,ETLUSER
12313,BASE5,BASE4
1032,BASE3,BASE3
4213,BASE3,BASE1

2) Next, a perl script (attached) is used to aggregate the users and databases to related groups:

:~/work # ./combine.pl example

Group 1: 5 databases: (ETLUSER, BASE5, BASE4, ETLUSER3, ETLUSER2)
Group 3: 3 databases: (BASE3, BASE1, BASE2)
Group 2: 3 databases: (ETLUSER4, ETLUSER6, ETLUSER5)

Each of these groups is independent of the others. Each can therefore be deployed separately from one another in different phases.

Rather than attempt to plan out a phased rollout on your own, it’s a good idea to engage the experts. Teradata professional services can assist with this process and help validate the results.

Part 2 - A walk through of a deployment of one group into UDL

The first step to adding UDL to an environment with an existing Teradata system is to configure Unity to allow access to the objects that exist on the existing Teradata system, while preventing any work from being done on the new second system. This requires both the correct routing rules and initial table placement. Initially, both systems should be active, with no tables managed in the UDL data dictionary.

Step 2.1 – Manage common read-only shared system objects on both systems

If there are any read-only shared system objects that might be needed by any applications, such as any UDF’s in TD_SYSFNLIB, it is a good idea to scan and include those objects so they are managed on both systems, so they are available for use on the second system when needed by users.

Step 2.2 – Initial table placement

In this example, we’ll start by scanning the first group of staging tables identified in our roll-out plan into a dictionary. Since databases are normally selected automatically by default when they exist on both Teradata systems, you may wish to turn off the dictionary scanner option that controls the automatic selection of databases, located on the General Configuration tab of the UDL configuration portlet:

Next start the database scan on the databases in the first group:

When selecting the databases for the initial dictionaries, it is critical that the databases are only selected on system 1, and not system 2, since we are not ready to access these objects on system 2 yet:

If the database has been properly selected only on system 1, the object selection screen should only provide the ability to select the objects in that dictionary on system 1 (and not system 2).

Step 2.3 - Controlling the creation of further tables

The dictionary scanner can only be used to select where existing tables are managed. It’s also necessary to control where new tables are placed before the transition to two systems. This is accomplished using routing rules. The ‘Create’ option on a routing rule can be used to select which system any new tables created by a connection are managed on. This is especially important for load jobs, which create error and log tables during their normal processing.

This routing configuration will force all users to system 1, while also ensuring any new tables are created only on system 1:

unityadmin> routing list;
DefaultRouting READ WRITE
System 1 Only READ db1 WRITE db1 CREATE PREFERRED

unityadmin> user update * * * * region1 'System 1 Only' TOP;
Successfully updated user map.

unityadmin> user update * * * * region2 'System 1 Only' TOP;
Successfully updated user map.

unityadmin> user list;

ID User Account Role Profile Region Routing
========================================================================
1 * * * * region2 System 1 Only
2 * * * * region1 System 1 Only
99999 * * * * * DefaultRouting

Step 2.4 – Validation on a single system

Now that the first group has been deployed and clients are connecting through UDL, it’s advisable to pause and confirm the clients are functioning as normal, before beginning the migration to two systems. As mentioned in the first section, operations should be vigilant for 4511 errors that indicate the routing rules or placement of tables are not correct.

Part 3 – Transitioning the group on to two systems

Next, we’ll change the management of the tables belonging to the user ETLUSER so they are managed on both systems.

Step 3.1 – Block access to the target group during the transition period

While the tables for the ETLUSER are being migrated to the second system, access to the users should be blocked in order to ensure an orderly transition:

unityadmin> user list;

ID User Account Role Profile Region Routing
=====================================================================
1 ETLUSER * * * * REJECT_LOGIN
3 * * * * region2 System 1 Only
4 * * * * region1 System 1 Only
99999 * * * * * DefaultRouting

Step 3.2 – Beginning the transition of specific load user and staging tables

We are now ready to change the placement of the ETLUSER tables from one system to both systems. The easiest way to accomplish this is to directly update the UDL repository. This query provides a summary of the number of objects that are managed on each system and both systems, grouped by database name:

SELECT
t2.dbname, t1.managedOn,count(*) FROM unity.dictionary_managed_objects t1, unity.dictionary_managed_databases t2
WHERE t1.moDBId=t2.moDBId
GROUP BY t2.dbname, t1.managedOn;

*** Query completed. 4 rows found. 3 columns returned.
*** Total elapsed time was 1 second.

dbname managedOn Count(*)
-------------------------------------------------------- ----------- -----------
etluser 2 9
etluser2 2 1
base 2 10
viewuser 2 112

In this query, the managedOn field contains a bitmask value. The following table indicates the meaning of the field values:

2 – The table is managed only on the first system.

4 – The table is managed only on the second system.

6 – The table is managed on both the first and second system.

To change all the objects in the ETLUSER database to be managed on both systems, we need to update the managedOn column to ‘6’ (managed on both systems). This update statement will do that for the ETLUSER database:

UPDATE unity.dictionary_managed_objects SET managedOn=6
WHERE managedOn=2 AND moDBId IN (select moDBId FROM unity.dictionary_managed_databases WHERE dbName IN ('ETLUSER'));

*** Update completed. 9 rows changed.
*** Total elapsed time was 1 second.

Note! You must execute this statement both on the active repository and standby repository so both remain in-sync.

Next, the dictionary reload command is issued from the unityadmin command line. This causes UDL to reload its dictionary from the tables, causing the change to take effect:

unityadmin> dictionary reload;
Operation Number : 11
Operation Name : Dictionary Reload
User : admin
User Name : Main Administration User
Start Time : 08/25 11:57:36
Updates:
08/25 11:57:36 [-] Info: Starting dictionary reload
08/25 11:57:37 [-] Info: 132 objects will be loaded
08/25 11:57:37 [-] Info: Loading udts
08/25 11:57:37 [-] Info: Loaded 0 managed udts
08/25 11:57:37 [-] Info: Loading tables
08/25 11:57:37 [-] Info: Loaded 87 managed tables
08/25 11:57:41 [-] Info: Loading functions
08/25 11:57:42 [-] Info: Loaded 2 managed functions
08/25 11:57:42 [-] Info: Loaded dictionary at version 2, major 4, minor 2
08/25 11:57:42 [-] Info: Load time: 1440529062
08/25 11:57:42 [-] Info: max_item_id: 1132, item count 394
08/25 11:57:42 [-] Info: Dictionary successfully reloaded
08/25 11:57:42 [-] Info: Reload dictionary succeeded
08/25 11:57:42 [-] Info: Operation finished
Status : Finished (1)
Finish Time : 08/25 11:57:42

unityadmin>

The ETLUSER tables will now appear as managed on system 2, and will be unrecoverable (because they have not been synced yet).

Step 3.2 - UPDATE for Unity v15.00.01! enlightened

This step is now much easier in Unity 15.00.01. The update to the Unity repository shown above is no longer necessary. To see where an object is currently being managed, just use the OBJECT SHOW commmand:

unityadmin> OBJECT SHOW dbtest.mytable MANAGED ON;
Object "dbtest.mytable" is being managed on :
1 (db1)
2 (db2)

To change where an object is managed, just use the OBJECT MANAGED

unityadmin> object manage dbtest.mytable on db1;
Table "dbtest.mytable" de-activated on :
2 (db2)

SUCCESS : Object is now being managed on given systems.

These new commands make it much easier to switch where a table is managed during a phased rollout and to correct table placement issues that cause 4511 errors. Additionally, no dictionary redeploy is necessary.

Step 3.3 – Sync the tables and then activate

You are now ready to sync the tables using DataMover or another method. Once synced, the tables can be activated. The easiest way to do this is to deactivate the database on the first system:

unityadmin> database deactivate etluser;
Operation Number : 31
Operation Name : Deactivate database
User : admin
User Name : Main Administration User
Start Time : 08/25 12:14:05
Updates:
08/25 12:14:05 [-] Info: Successfully deactivated table 'etluser.et_mloadc0' on 1 systems.
08/25 12:14:05 [-] Info: Successfully deactivated table 'etluser.jdbc_fastload_test' on 1 systems.
08/25 12:14:05 [-] Info: Successfully deactivated table 'etluser.mloadc0' on 1 systems.
08/25 12:14:05 [-] Info: Successfully deactivated table 'etluser.mloadc0_ilog' on 1 systems.
08/25 12:14:05 [-] Info: Successfully deactivated table 'etluser.mloadc0_ulog' on 1 systems.
08/25 12:14:05 [-] Info: Successfully deactivated table 'etluser.problemload' on 1 systems.
08/25 12:14:05 [-] Info: Successfully deactivated table 'etluser.tptloadc0_log' on 1 systems.
08/25 12:14:05 [-] Info: Successfully deactivated table 'etluser.uv_mloadc0' on 1 systems.
08/25 12:14:05 [-] Info: Successfully deactivated table 'etluser.wt_mloadc0' on 1 systems.
08/25 12:14:05 [-] Info: Operation finished
Status : Finished (1)
Finish Time : 08/25 12:14:05

Then re-activate the database on both systems:

unityadmin> database activate etluser;
Operation Number : 33
Operation Name : Activate database
User : admin
User Name : Main Administration User
Start Time : 08/25 12:14:20
Updates:
08/25 12:14:20 [-] Info: Successfully activated table 'etluser.et_mloadc0' on 2 systems.
08/25 12:14:20 [-] Info: Successfully activated table 'etluser.jdbc_fastload_test' on 2 systems.
08/25 12:14:20 [-] Info: Successfully activated table 'etluser.mloadc0' on 2 systems.
08/25 12:14:20 [-] Info: Successfully activated table 'etluser.mloadc0_ilog' on 2 systems.
08/25 12:14:20 [-] Info: Successfully activated table 'etluser.mloadc0_ulog' on 2 systems.
08/25 12:14:20 [-] Info: Successfully activated table 'etluser.problemload' on 2 systems.
08/25 12:14:20 [-] Info: Successfully activated table 'etluser.tptloadc0_log' on 2 systems.
08/25 12:14:20 [-] Info: Successfully activated table 'etluser.uv_mloadc0' on 2 systems.
08/25 12:14:20 [-] Info: Successfully activated table 'etluser.wt_mloadc0' on 2 systems.
08/25 12:14:20 [-] Info: Operation finished
Status : Finished (1)
Finish Time : 08/25 12:14:20

unityadmin>

Step 3.4 – Allow the user access to both systems

The ETLUSER now has the tables it requires to operate in parallel on both systems. This access can be granted by updating the user mapping for ETLUSER to use the default routing rule:

unityadmin> user update  ETLUSER * * * * DefaultRouting TOP;
Successfully updated user map.
unityadmin> user list;

ID User Account Role Profile Region Routing
==========================================================================================
2 ETLUSER * * * * DefaultRouting
3 * * * * region2 System 1 Only
4 * * * * region1 System 1 Only
99999 * * * * * DefaultRouting

This completes the process of migrating the first group containing the user ETLUSER and its associated tables. To migrate more ETL users and staging tables, steps in part 3 would be repeated. Once all of the staging groups were completed, then the process would be followed by groups of base tables (if separate) and any users that write to them, followed by the views in the semantic layer to support reporting and application users.

With careful planning and preparation, moving from a single production system to active-active production systems is easy to achieve.

2 REPLIES
Teradata Employee

Re: Phased Rollouts in Unity Director & Loader

Note there are new commands in Unity 15.00.01 that make step 3.2 much easier! You can now see and change where a object is managed with the unityadmin commands:

OBJECT SHOW [object name] MANAGED ON;

and

OBJECT MANAGE [object name] ON system 1 [system 2, etc]

These will make life much easier if you are having problems with 4511 errors.

Re: Phased Rollouts in Unity Director & Loader

Hi Paul,

thanks for that very helpful article. You know we are just in activation phase of unity at REWE in Germany. Especially the enhancement with 15.00.01 that we can change the managedOn with one command is very useful.

I tried this in our testenvironment and it worked fine.

I would cautious propose to change wording in your article from

'Additionally, no dictionary redeploy is necessary' to '... no dictionary reload is necessary'.

I've tested in our Testenvironment a table which is already active-active, to change it back to active-passive.

It worked as expected.

Afterwards I tried to change it back to active-active. It worked fine, too. But I would cautious propose to change the output of the command, which is a bit confusing:

TDTEST1UDL:~ # ud 'OBJECT MANAGE STAG.D_RQMS_CL_FRAGE ON tdtest1,tdtest2'
Login successful, authenticated connection established.

Welcome to the Unity interactive administration shell (version 15.00.01.00)

Commands end with ;
Type 'help' for additional shell commands.

All operations will be run in synchronous mode.

Table "stag.d_rqms_cl_frage" de-activated on :
2 (tdtest2)

SUCCESS : Object is now being managed on given systems.

ud is an alias for unityadmin call in our environment.

The result was correct, to be seen with ud 'OBJECT SHOW STAG.D_RQMS_CL_FRAGE MANAGED ON' Cmd or in Viewpoint-UDSetup Portlet Filter in Deployed Dictionary (I proved also direct in Unity-Dictionary, the managedOn-Flag was set to 6 correctly),

but 'object de-activated on tdtest2' is not just right because it was taken from active-passive to active-active.

Right, to summarize: great enhancement and very helpful article, thx.

Kind regards,

Jörg Weidenfeld

Unity-Project REWE Systems, Cologne, Germany