Laddered Concurrent Connect (LCC): Client Performance Improvements

Connectivity
Connectivity covers the mechanisms for connecting to the Teradata Database, including driver connectivity via JDBC or ODBC.
Teradata Employee

Laddered Concurrent Connect (LCC): Client Performance Improvements

Have you ever experienced a Node Panic or Down Node?  Do you have Hot Standby Nodes (HSN's) in your configuration?  

If you answered 'YES' to any of these questions, read-on about how Teradata has introduced a new feature that may improve performance in these types of situations.

Let's start with some background info:

Fast connect times are of paramount importance to most applications that interface with the Teradata Database.  Further, the sooner this connection takes place, the sooner transactions can be processed on the Teradata Database.

Normally this connection is very quick, measureable only in milliseconds.  However, when one or more nodes are down in a clique, this normally fast connection can be extremely slow.  Why? Well, the problem occurs in the connectivity APIs, (CLIv2, ODBC, JDBC, .Net Data Provider) and their normal behavior of waiting for a period of time for a down node to respond.

Understand that the APIs do have some intelligence, as they will mark a node as down and NOT retry it for a given interval.  However, this down node information is specific to a particular application instance and must be re-discovered each time a new instance is created. 

Imagine the performance implications on a system with 100,000 or even 1 million concurrent instances. Therefore, in order to mitigate the performance bottleneck associated with a down node, Teradata has introduced a new, smarter approach called Laddered Concurrent Connect, or LCC.

How does Pre-LCC work?

Situation:  Running normal workloads during peak business hours with Pre-LCC functionality

Client Info:

A.  Operating System: Windows

B.  Application: Teradata SQL Assistant

C.  Connection Method: Teradata ODBC Driver

D.  ODBC: Wait time interval = Default (20 seconds)

E.  DNS Listing configured (Each COP is assigned an individual IP address)

Database Info:

A.  Teradata DBS Name:  Production

B.  System configuration:  4 Node MPP (Enterprise Data Warehouse - EDW)

C.  1 node (COP1) panics (goes down) and drops out of the configuration

D.  3 nodes ONLINE, 1 node DOWN (not responding)

Critical Concepts:

There are 2 phases that readers should be aware of when viewing the diagram below:

Phase 1:  ODBC attempts to obtain an IP address from the DNS server

Phase 2:  ODBC attempts to establish a connection to the DBS (Teradata)

In Phase 1, DNS success or failure is completely INDEPENDENT of the state of the DBS.  In other words, DNS doesn't care if the DBS is Up or Down.

Next, if and only if Phase 1 is successful, will ODBC attempt to establish a connection to the DBS (Phase 2).

Now, lets look at the Pre-LCC functionality:

Observations:

1.  The initial IP address request FAILS because DNS listing is configured, which means the nodes are defined on a per COP basis.

     (e.g. ProductionCop1 = xxx.xx.xx.xxx, ProductionCop2 = xxx.xx.xx.xxx, etc)

2.  ODBC will always attempt to connect to COP1 FIRST.

3.  ODBC only attempts to connect to a SINGLE (Cop or Node) at a time.     

4.  ODBC must wait the ENTIRE time interval before it recieves a response back.

How does LCC work?

LCC can significantly reduce connect times because it allows the interface to concurrently target multiple nodes for connection.  This has the effect of bypassing COPs on down or extremely slow nodes and improving overall elapsed times for session establishment.  Once the first successful connect is recognized, any remaining occupied sockets are released (closed). A down node will no longer force the client interface to pause until the wait time interval has expired before it attempts to connect to another node.

To guard against excessive resource consumption (e.g., extraneous sockets) from becoming a consideration, LCC does not fire off connect requests all at once.  Rather, it applies a proprietary dynamic delay interval between connect requests that adjusts according to the previous connect response time.  In other words, LCC tries to optimize the delay interval based on current network and server performance characteristics.  This means the delay interval will gradually increase if connect responses are relatively slow and decrease if they are relatively fast.  This allows LCC to efficiently handle a variety of different network topologies and server workloads.

Lets take a look at the LCC feature functionality:

Observations:

1.  ALL cop entries are returned to the API/Driver (e.g. ODBC)

2.  ODBC will RANDOMLY determine which Node/COP to connect to first.  (i.e. Will NOT always be Node 1/COP1)

3.  ODBC connects to MULTIPLE nodes based on the a specified "Delay" Internval

4.  Response time drastically reduced to milliseconds from full timeout intervals (e.g. 60 seconds, 30 seconds)

How do users implement LCC?

This is the best facet of the feature.  No changes are required to existing apps, DNS, or DBS settings.  LCC is e-fixed from TTU 13.10 back to TTU 12.0 (with no co-requisite DBS e-fixes).  There is no tuning needed and no parameter changes required because LCC is completely self-adjusting.  In fact, most customers will never even be aware of LCC – unless a node goes down or a dormant DNS-registered HSN is present in the configuration.   In such cases, connect times should be considerably faster.

*NOTE*:   Prior to LCC, customers were routinely advised to omit HSNs from DNS because of unacceptably long connect delays that would result.  With LCC, customers are encouraged to include HSNs in DNS.  The main reason for doing so would be to preserve network bandwidth that would otherwise be lost when a node goes down.  However, unless such a loss would represent a significant performance hit for a particular configuration, the net benefit would be negligible.

Frequently Asked Questions (FAQ)

1.  My organization has deployed IP load balancing devices (e.g. F5's BIG-IP).  What changes do I need to make?

1a. Configure the IP load balancing devices to direct traffic to TPA READY nodes and HSNs (IP load balancing devices can be programmed via their native scripting language to reroute “connection refused” failures returned by a dormant HSN to other COPs in the clique.  This activity occurs wholly unbeknownst to the client interface).

2.  My organization is using round-robin DNS (or "smart" DNS device) for load balancing and failover.  What changes do I need to make?

2a.  Include the IP address of each TPA READY node and each HSN

3.  My organization is using a Classic COP naming scheme.  What changes do I need to make?

3a.  One COP name is defined for each TPA READY node and for each HSN.

4.  Are there any restrictions or limitations I should be aware of for LCC?

4a.  LCC was NOT developed for the OLEDB Provider, therefore care should be taken to NOT add a HSN to DNS or hosts for OLEDB clients

5.  How much of a performance gain can I expect from LCC?

5a.  As always, it depends.  However, based on internal testing, the differences (Between Pre/Post LCC) were most appreciable and consistent for multi-threaded applications. Further, for down-node situations, users will avoid potential long delays.

6.  In what versions is LCC available?

6a.    CLIv2 implemented in 12.0.0.10, 13.0.0.2 & 13.10.0.2

         JDBC implemented in 12.0.0.111, 13.0.0.27 & 13.10.0.2

         .NET Data Provider implemented in 12.0.1.1, 13.1.0.3 & 13.10.0.0

         ODBC implemented in 12.0.0.14, 13.0.0.8 & 13.10.0.4

         QRYDIR implemented in 12.0.0.7, 13.02.0.1 & 13.10.0.0

Questions/Concerns

If you have any questions or concerns regarding LCC, please leave a comment.  For community support, please post a topic in the Connectivity Forum.