Working with Identity Columns and Unity Director and Loader

Unity
Unity is Teradata’s data synchronization and workload routing tool providing Active – Active database availability delivering near real-time RTO/RPO
Teradata Employee

Working with Identity Columns and Unity Director and Loader

One of Unity Director and Loader’s core benefits is the ability to keep multiple Teradata systems synchronized with online, transactional-consistent, changes. Unlike post-transactional replication, Unity Director’s SQL-multicast sends requests in parallel to all connected systems at the same time. It also performs consistent checking on the responses of every write statement to ensure changes both systems succeed and produce the same results.

While this works great for normal data, Identity columns pose a special problem. Identity columns in a teradata database are auto-generated numbers inside the Teradata database that are typically used as surrogate keys.

Consider the following employee table, which uses a default IDENTITY column for an id:

CREATE SET TABLE tbl_emp ,NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT,
DEFAULT MERGEBLOCKRATIO
(
id INTEGER GENERATED ALWAYS AS IDENTITY
(START WITH 1
INCREMENT BY 1
MINVALUE 0
MAXVALUE 1000000
NO CYCLE),
Name VARCHAR(20) CHARACTER SET LATIN NOT CASESPECIFIC,
Phone VARCHAR(10) CHARACTER SET LATIN NOT CASESPECIFIC)
PRIMARY INDEX ( id );

Since Unity Director/Loader has no was to synchronize the current value of the IDENTITY column across multiple systems and it can’t see the value provided inside the database for the ID column, it can’t guarantee the consistency of the data. For this reason Unity Director and Loader disallow writes to table using IDENTITY columns:

BTEQ -- Enter your SQL request or BTEQ command:
insert into tbl_emp (Name,Phone) values ('Paul','1234567890');

insert into tbl_emp (Name,Phone) values ('Paul','1234567890');
*** Failure 4514 Query is disallowed because it writes to a table containing an identity column.
*** Total elapsed time was 1 second.

This can pose a challenge for anyone that wants to adopt Unity Director and Loader to do data synchronization but also needs to write to tables that require IDENTITY columns. Let’s look at some of the ways to overcome this challenge.

Do you need to write to this table via Unity Director or Loader?

Just because you have a table with an identity column in it, does not automatically mean you have a problem. If the table is relatively static and unchanging or changes at scheduled times, it might be easier to use a Load-Copy approach by loading the table directly on one system, and then replicating it to other systems using DataMover. Unity Ecosystem manager can be used to coordinate this approach.

Consider removing the Identity column

The first option considered should always to be eliminate the IDENTITY column altogether. Whether this is possible depends on why a surrogate key was chosen in the first place.

  • In many situations, IDENTITY columns have been used to implement multi-system replication methods using change data capture – methods that may now be irrelevant in a Unity Director/Loader environment, since Director and Loader don’t need to rely on change data capture.
  • Sometimes surrogate keys are chosen in order to provide better distribution than the natural key of the table would provide. There are, however, other alternative techniques, like adding a timestamp column to the PI to address issues of skew that can be used in place of using a surrogate key.
  • Occasionally, surrogate keys are used simply to achieve a clean snow-flake design. In these cases, it’s becomes an engineering question of balancing the new trade-offs of using IDENTITY columns (not writing to them via Unity Director/Loader) versus the benefits (conformance to a particular design ideal).

Consider moving the surrogate key function to the client

Note that while IDENTITY columns do generate sequential numbers, this is actually not a requirement for surrogate keys. Surrogate keys require unique numbers, not necessarily sequential numbers. For a variety of reasons, sequential numbers can be more work for databases to deal with. For a vast majority of cases, substituting a client generated value for a surrogate key is an ideal solution. Any time the client application is in-house or custom build, this should be considered. Almost all programming languages have UUID (universally unique id) functions built in. Most ETL applications (E.g. Abinitio, DataStage, Informatica) also have built-in functions to generate surrogate keys for tables.

Using a client side function eliminates any compatibility issue with Unity Director/Loader, and in many cases will allow Director/Loader to make better locking choices since it can now see the value of the surrogate key that is used in the Primary Index of a table, which will improve concurrency in some situations.

Here’s an example in java of a prepared statement using Java’s UUID function to generate a surrogate key. In this example, the UUID is stored as a CHAR(36) in the database, instead of an INTEGER:

drop table tbl_emp_client_uuid;
CREATE TABLE tbl_emp_client_uuid
(id char(36),
Name VARCHAR(20),
Phone VARCHAR(10))
PRIMARY INDEX (NAME,PHONE);

// INSERT statement
String insertTable = "INSERT INTO " + tableName + " (id,NAME,Phone) VALUES (?,?,?)";
PreparedStatement pstmt=null;
try {
pstmt = con.prepareStatement(insertTable);
pstmt.setString(1, UUID.randomUUID().toString());
pstmt.setString(2, name);
pstmt.setString(3, phone);
pstmt.execute();
}
catch (SQLException ex) { handleError(ex); }

The result is a good surrogate key that will distribute well, and is globally unique:

select * from  tbl_emp_client_uuid;

*** Query completed. 1 rows found. 3 columns returned.
*** Total elapsed time was 1 second.

id Name Phone
------------------------------------ -------------------- ----------
00694654-ba20-4bf6-b5df-cecda9b815ad Paul 1234567890

Implement your own sequential number key in the database

If you do require sequential numbers for surrogate keys, it is possible to implement a solution that uses the database to provide sequential numbers. There are several possible ways to implement this, but the most common is to use a key table to provide values for several different sequences:

CREATE SET TABLE SEQUENCES
(
SEQ_Name VARCHAR(30),
Next_Value DECIMAL(18,0) TITLE 'Next sequence value'
) UNIQUE PRIMARY INDEX (SEQ_Name);

New sequences are defined by insert rows into this table. The Next_Value column determines the next value taken by a client application.

insert into sequences values ('emp_id',1);

*** Insert completed. One row added.
*** Total elapsed time was 1 second.

To use these sequences, client applications must allocate blocks of sequences values. A simple stored procedure might be used encapsulates the request to do this:

REPLACE PROCEDURE getSequenceValues (
IN Name VARCHAR(30),
IN neededValues DECIMAL(18,0),
OUT nextValue DECIMAL(18,0)

)
BEGIN
BEGIN TRANSACTION;

LOCKING ROW FOR WRITE SELECT Next_Value into nextValue FROM sequences WHERE SEQ_NAME = Name;
UPDATE sequences SET Next_Value = Next_Value + neededValues WHERE SEQ_NAME = Name;

END TRANSACTION;

END;

…or this could be left as a client-side function. In this example, a client could allocate a set of 10 values using the stored procedure:

call getSequenceValues('emp_id',10,NextVal);
*** Procedure has been executed.
*** Total elapsed time was 1 second.
nextValue
--------------------
31.

… and then use them by setting the value of the id column (in the same way it set the ID column to the UUID value above, only with an integer) and increment the id as it consumes the 10 values it allocated. Note that this is still, in a slightly different form, using a client-side solution to generate surrogate keys. The only difference is it relies on the value in a table inside the database to ensure uniqueness.

It’s also important to note that allocating a single value at a time when requiring many values could lead to scalability problems since Unity Director and Loader will acquire a table-level lock on the sequences table when the getSequenceValue stored procedure is called. This stored procedure call should be very fast, since it’s only doing a single-row update, but with high concurrency, frequent calls could cause serialization across all the processes that use it for surrogate keys.

Build your own internal database UUID

All of the above techniques require some application changes to accommodate them. As a last resort, there is one technique that doesn’t require any client changes, but does come with a performance cost. It is possible to implement your own UUID function inside the database that is deterministic across Teradata systems, and to define a trigger to set this value as a surrogate key.

This function uses the RANDOM and CURRENT_TIMESTAMP to create a unique id for a surrogate key, that will also guarantee deterministic values when executed on multiple Teradata systems (above version 14.00.02) when used with Unity Director and Loader. This function relies on the fact that for connections that go through Unity Director and Loader, Director supplies the current time and random seed to the internal database functions.

REPLACE FUNCTION syslib.UUID()
RETURNS BIGINT
LANGUAGE SQL
CONTAINS SQL
DETERMINISTIC
CALLED ON NULL INPUT
SQL SECURITY DEFINER
COLLATION INVOKER
INLINE TYPE 1
RETURN CAST(
TRIM(CAST(Random(-92233703,922337203) AS CHAR(9) ))||
(CURRENT_TIMESTAMP(FORMAT 'MMDDHHMISS') (CHAR(10)))
AS BIGINT);
grant execute function on syslib.uuid to public;

Since this function returns a BIGINT instead of an INTEGER, a minor change is required to the employee table:

CREATE TABLE tbl_emp_new
(id BIGINT,
Name VARCHAR(20),
Phone VARCHAR(10))
PRIMARY INDEX (NAME,PHONE);

To set the value of the id, a trigger is used that emulates a default value:

REPLACE TRIGGER etltest.tbl_emp_uuid BEFORE INSERT ON etltest.tbl_emp_new
REFERENCING NEW AS NewRow
FOR EACH ROW when (NewRow.id is null)
( set NewRow.id=syslib.UUID() ; );

The advantage of this approach is that no further client changes are required. The table now functions much like it did when it used the IDENTITY column:

 *** Query completed. One row found. 3 columns returned.
*** Total elapsed time was 1 second.

id Name Phone
-------------------- -------------------- ----------
1899301991121182541 Paul 1234567890

There is however a high performance cost to this convenience. The following table shows a comparison of the time used to insert a varying number of rows into the tbl_emp table using several different methods.

  • The first two scenarios show the baseline time direct to a single Teradata system (bypassing Unity Director) and to the table without any value for the ID column. Any client side function used to provide a surrogate key would provide similar performance.
  • The second scenario shows the time to insert the same rows using the sequence method to allocate a set of sequencer values for the surrogate key. In this test, values were allocated in batches of 100. Since the overhead in this method comes from the extra stored procedure call, increasing the batch size would improve the efficiency of this method, while decreasing it would worsen the efficency.
  • The last scenario shows the method using the custom deterministic UUID function and trigger to set the ID value in the table. Note that this method is 6-7 times as slow as the baseline for one session and ~40 times as slow for 10 sessions.

Reps:

10

   Baseline (TD direct with Identity col)   

   Baseline (via Director, no ID column)  

   Using Sequence Method  

   Using UUID-trigger method  

Rows:

5000

Min

Avg

Max

Min

Avg

Max

Min

Avg

Max

Min

Avg

Max

Sessions

 Total Rows

Sec

Sec

Sec

Sec

Sec

Sec

Sec

Sec

Sec

Sec

Sec

Sec

1

50000

9

9.7

10

10

10.5

12

12

12.6

14

66

66.5

67

5

250000

11

15.26

19

11

12.78

16

56

70.74

91

326

340.45

357

10

500000

15

18.34

27

15

15.95

23

124

143.25

168

725

743.1

768

Selecting the best technique for you

An ounce of prevention is worth a pound of cure; if you are considering Unity Director or Loader for your multisystem environment at some point in the further, a best practice is to avoid implementing IDENTITY columns now. IDENTITY columns can pose problems for any active-active multisystem environment, regardless of the synchronzation techique used.

If the situation arises that you need to find an alternative to replace IDENTITY columns in order to implement Unity Loader, you should assess each of these methods based on the specific requirements for each of your tables. You might use a single technique for all your tables or several different approaches for different sets of tables. With careful consideration, the challenge of IDENTITY columns in a Unity Director/Loader environment can be easily overcome.

10 REPLIES
Supporter

Re: Working with Identity Columns and Unity Director and Loader

Paul,

the syslib.UUID() is clearly not usable as a UUID generator. The number of synonyms is fare to high.

With

select id, count(0)
from
(
select calendar_date,
syslib.UUID() as id
from sys_calendar.calendar
) as t
group by 1
having count(*) > 1
order by 1;

I got already 2 synonyms - for only <80.000 values.

<800.000 rows I got already 279 synonyms...

select id, count(0)
from
(
select c.calendar_date,
syslib.UUID() as id
from sys_calendar.calendar c
cross join
sys_calendar.calendar j
where j.day_of_calendar between 1 and 10
) as t
group by 1
having count(*) > 1
order by 1;

Your result might vary but running this some times should show the issue.

Reason is that the number range of Random(-92233703,922337203) is fare to small! There exists some good formulas to calculate the probability of collisions.

SHA256 might be usable for this approach.

But as UUIDs are getting more and more common TD might consider implementing this as a core function - UDFs will always consume too much CPU...

Regards Ulrich

Teradata Employee

Re: Working with Identity Columns and Unity Director and Loader

Ulrich, I think you may have taken the math slightly out of context - the entire key consists of the timestamp and the random number, not just the random number and is intended to provide a surrogate key for a single table.

The probability to worry about is the chance that two current sessions will draw the same random number for the same table at exactly the same second. In this context, Random (-92233703, 922337203) is adequate for the purpose (80K sessions writing to the same table at the same second is high) - but even if it wasn't - the way the random function is seeded within the database, separate concurrent sessions are already guaranteed to take distinct values, even if they execute at exactly the same time.

I agree though - I'd prefer to see a proper system function implemented, rather than use a UDF as a workaround. Any UDF will obviously slow things down.

Supporter

Re: Working with Identity Columns and Unity Director and Loader

Hi Paul,

I have to admit that I missed the possible impact of the trigger and the current_timestamp when you load the data with TPUMP or TPT Stream.

But if you use the szenario for insert/selects - which I saw in some ELT implementations - you will run into this issue as the current_timestamp is the same for the whole transaction -> only the random give different value and this is too small range of values.

Check 

create table yourDB.ident_test
( id bigint,
dt date,
yr smallint
) primary index (id);

REPLACE TRIGGER yourDB.tbl_ident_test_uuid BEFORE INSERT ON yourDB.ident_test
REFERENCING NEW AS NewRow
FOR EACH ROW when (NewRow.id is null)
( set NewRow.id=syslib.UUID() ; );

insert into yourDB.ident_test (dt,yr)
select c.calendar_date,
c.year_of_calendar
from sys_calendar.calendar c
cross join
sys_calendar.calendar j
where j.day_of_calendar between 1 and 10
;

select id, count(*)
from yourDB.ident_test
group by id
having count(*) > 1
;

I got >200 duplicate ids.

Ulrich

Teradata Employee

Re: Working with Identity Columns and Unity Director and Loader

Hi Ulrich,

Yes, you are exactly correct - the UUID UDF isn't suitable if you are issuing a single request like an insert-select. Unity Director/Loader will sets the current time for the request (which is also part of the random function seed) once as it passes through the Unity server. All rows will take the same value, and the chance of collisions becomes high. You shouldn't use this approach in that scenerio.

But to take a step back, the overall point though is that there are several approaches you could take. None of them will be suitable for all situations, but you should be able to use one of them.  As I mention above, attempting to use a UDF would be the solution I'd suggest as a last resort.

N/A

Re: Working with Identity Columns and Unity Director and Loader

Hi Paul,

I am trying to execute below code but getting error in second code i.e. at the time of creation index.

We need unique sequence number in the table and created table will be accessed by around 40 users (they will insert data more than 100 times in day using one application) as well as Web Service will update some columns based on the created index. 

I am not the teradata expert need experts guidance/Help for resolving below issue. 

Code I :

CREATE MULTISET TABLE DD_CI.SMS_MESSAGE

(SMS_ID  INTEGER GENERATED ALWAYS AS IDENTITY

(START WITH 1

INCREMENT BY 1

MINVALUE 1

MAXVALUE 1000000000

NO CYCLE) not null,

CMG_ID VARCHAR(50),

DESTINATION_ADDRESS VARCHAR(50),

SMS_BODY VARCHAR(1000),

SMS_STATUS VARCHAR(30),

SCHEDULE_DLVRY_DATE TIMESTAMP (0),

SMS_LANGUAGE VARCHAR(2),

COMMUNICATION_ID VARCHAR(50),

SMSC_MESSAGE_ID VARCHAR(100),

DLVRY_RECEIPT_AT TIMESTAMP (0),

INSERTION_DATE TIMESTAMP(0),

CONSTRAINT SMS_MESSAGE_PK PRIMARY KEY (SMS_ID) ) ;

Code II :

CREATE INDEX UPD_BTCH_MSG_IND (SMS_ID  ,CMG_ID,SMS_STATUS,COMMUNICATION_ID) ON DD_CI.SMS_MESSAGE;

ERROR: CREATE INDEX FAILED 5784. Illegal use of Identity Column SMS_ID.

Also please suggest if any other better way is available. 

Thanks in Advance.

Regards

KS

Teradata Employee

Re: Working with Identity Columns and Unity Director and Loader

Hi KS,

Your index create is failing because you are trying to use an identity column in a composite index, which isn't allowed.

You might try the seconday index without Identity column in it.

Here's a link that you might find useful (TD 14.00 version): http://www.info.teradata.com/htmlpubs/DB_TTU_14_00/index.html#page/SQL_Reference/B035_1184_111A/Crea...

Enthusiast

Re: Working with Identity Columns and Unity Director and Loader

To summarise, there may be legacy code, for example that which depends on identity columns, that it will not be viable to use with Unity.  As such it will be extremely helpful if generic code, such as that executed in stored procedures, works the same whether or not it is executed on a table that is managed by Unity.  See related comments in

http://developer.teradata.com/general/articles/working-with-stored-procedures-in-unity-director#comm...

I am surprised by how little comment these excellent articles are generating.  Is this because Unity is being used less then is being claimed or are sites successfully managing independent processing of Unity and non-Unity enabled data.

Teradata Employee

Re: Working with Identity Columns and Unity Director and Loader

Thanks butchec.

There's a wide range of use-cases for Director/Loader and there's no need to use it for everything all at once. Some sites will simply use Director for reporting - they don't need to worry about things like stored procedures or identity columns. Others that use Loader for dual-loading will use it for most of their data warehouse, but might skip particular databases with legacy issues, or they might slowly remove the legacy limitations. The path of least resistence is usually the right path.

Enthusiast

Re: Working with Identity Columns and Unity Director and Loader

Paul,

I was searching for info on generating a UUID and found this.  While this may not meet my needs, I was confused why the SQL UDF was defined as DETERMINISTIC when it is not.  Per the TD DDL SQL UDF manual,

DETERMINISTIC -- that the function returns identical results for identical inputs.

NOT DETERMINISTIC -- that the function might not return identical results for identical inputs.

For example, if the SQL function calls a random number generator as part of its processing, then the results of an SQL function call cannot be known in advance of making the call and the function is NOT DETERMINISTIC.