Teradata Virtual Storage: The new way to manage Multi-Temperature data

Database
The Teradata Database channel includes discussions around advanced Teradata features such as high-performance parallel database technology, the optimizer, mixed workload management solutions, and other related technologies.
Teradata Employee

Teradata Virtual Storage: The new way to manage Multi-Temperature data

On August 24, 2009 Teradata Corporation announced the availability of innovative software that virtualizes data storage: Teradata Virtual Storage (TVS). TVS enables Teradata customers to add storage capacity at very low cost and to maximize performance to meet the enterprise intelligence demands of business users. Teradata has used virtualization techniques since the mid 1990s with the technology that virtualizes the resources of the node through the AMP based architecture. AMPs were once hardware devices, back in the days of the Teradata DBC 1012 database computers. Since the mid 1990s, AMPs have been referred to, and utilized, as a type of virtual processor (vproc), specifically, an AMP vproc.


Note: The AMP is a type of vproc that has software to manage data.

Since that time, Teradata systems have been able to be configured to achieve greater than 90% CPU utilization. And systems are easily expanded by simply adding more AMP vprocs within new nodes.

TVS is an optional storage subsystem software product that operates between the Teradata Database and the storage arrays. The goal of TVS is to deliver the flexibility and benefits of virtualization to Teradata’s storage capabilities. TVS enables mixing storage technologies in a clique with intelligent placement of data and “temperature based” migration of data.

Note: Each node is an SMP server which runs a copy of operating system and database software and contains CPUs, system disk, memory and adapters . A clique is a group of nodes that share access to the same disk arrays.

Requirements

To utilize the current version of TVS, your system must meet the following requirements:

  • It must be one of the following Teradata Active EDW Platforms:
    • ​5400, 5450, 5500, 5550, 5555.
  • ​It must be running the Linux Operating System.
  • It must use one of the following storage types:
    • ​Teradata Enterprise Storage
    • EMC Symmetrix Disk Array.
  • ​It must be running Teradata Release 13 software and above.
  • It must use RAID 1. RAID 5 is not supported by TVS.

An additional feature of TVS is the enhancement to Teradata’s Multi-Temperature warehouse capability. TVS keeps detailed statistics on the usage patterns of all cylinders, and associates a temperature (Hot, Warm, Cold) to each cylinder. TVS continuously moves or migrates the data within or between drives as the temperature of the data is established. This migration is done automatically by the TVS software and is transparent to the user and DBA. The impact on system resources is configurable and is normally adjusted to be very slight.

TVS Multi-Temperature enhancements enable the mix of larger and smaller capacity drives within a clique to provide lower cost per unit of data storage for Cold data. It also provides optimal placement of data on storage resources as determined by data temperature. 

When TVS is enabled on a Teradata system, each AMP will still own an identical amount of data, but the storage devices themselves will be logically owned by TVS. The clique will still be connected to the pool of storage within the attached arrays, but it will be TVS that allocates the storage pool to the AMPs in cylinders. Since TVS has virtualized the storage from the AMP’s perspective, there is no longer any need to have a restriction on mixing drive sizes or array types within the clique. This introduces enormous flexibility in the possible storage configurations.

Virtualization

Virtualization, in terms of computing, is a term that refers to the abstraction of computer resources. It is a way to logically look at hardware resources so that the actual details of the hardware pieces are abstracted or generalized. It is the creation of a virtual (rather than actual) version of something, such as an operating system, a server, memory, a storage device or network resources.

Storage virtualization is the pooling of physical storage from multiple storage devices into what appears to be a single storage device. Storage virtualization is commonly used in storage area networks (SANs).

Virtualization can be viewed as part of an overall trend in information technology that includes autonomic computing. The goal of autonomic computing is to develop computer systems capable of self-management, to overcome the rapidly growing complexity of computing systems management, and to decrease the inhibitions that complexity poses to further growth. Autonomic computing refers to the self-managing characteristics of distributed computing resources, and their adaptability to unpredictable changes while hiding intrinsic complexity to operators and users. An autonomic system makes decisions on its own, using its own strategy and procedures. It will constantly monitor and optimize its status and automatically adapt itself to changing conditions.

Benefits of TVS

Teradata storage can be expanded at a much lower cost than is possible without TVS . Storage can now be added to a clique without having to add both nodes and storage as was the case prior to TVS.

The flexibility to mix drive sizes within each clique enables configurations where the high volume history data or other Cold data can be integrated within the EDW. This expands the utilization of the EDW within reach of deep history data analysis, with a result of enhancing the return on investment for the EDW.

The configuration flexibility of TVS now allows storage in a clique to be expanded in a wide range of size increments since the restrictions on drives per AMP are eliminated. Expansion can be accomplished by adding the desired drive count and doing a restart. The system Reconfig process is not needed since AMP count assignments are not typically changed. Since the TVS based approach does not usually require added AMPs, only a system restart (which is just a few minutes long) is required after new storage is added to a system based on TVS.

TVS leverages existing storage infrastructure and energy consumption to expand storage since all unused drive slots in an array can now be easily used and larger capacity drives can be intermixed within an array to grow total storage capacity. This results in improved energy usage per TB of storage and per square foot of data center floor. TVS lowers the price of Fallback data protection in systems with higher availability needs by enabling a less costly approach for growing storage in a clique.

Multi-Temperature Data

TVS continuously collects statistics on the frequency of access for all cylinders. It is actually measured on a cylinder level by the Metric Collection process. Based on these statistics, TVS then continuously migrates the most frequently used (Hot) data to the fastest areas of storage. This will then displace the Cold data in these areas, and the Cold data will migrate to the slowest areas of the storage. The outside portions of a spinning disk drive provide faster I/O than the inside portions of that same disk drive. TVS recognizes the relative performance of the various portions of each disk drive and matches it with the relative temperature of the data to be migrated.

TVS can be used to add Cold storage to a system in order to cost effectively build the elements of a Multi-Temperature warehouse. However, the full implementation of a Multi-Temperature warehouse still requires the use of Teradata workload management features such as Teradata Active System Manager (TASM). TVS provides only the placement of the data to appropriate areas of the available storage, but does nothing to determine the system usage of the data by temperature class. The TASM product can be configured to accomplish the allocation of the system resources to the data as appropriate to the temperature of the data. So, Hot data tables, partitions, users, etc. would be allocated the majority of the resources to ensure full query support for this data. Likewise Cold data would be allocated a much smaller portion of the resources so that Cold data usage does not impact the time critical Hot data queries.

When should TVS be used?

When the Teradata system already has both Hot and Cold data and more storage is needed for expansion, TVS can be used to add more disks for both Hot and Cold use as required. The balance of Hot to Cold data usage will determine what storage configuration should be and the proper disk capacity points. The system’s current data temperature characteristics – the data thermal demographics – can be determined with the aid of an Assessment Tool that collects statistics on the frequency of data usage across selected tables. This tool is available through your Teradata GSS representative and runs transparently on the system for a week or two.

TVS can be used when it becomes necessary to add Cold data to the EDW, for example to add history or compliance data for deep analytics within the EDW environment. This data is easily determined to be Cold since it had been archived or simply not used prior to this new need. So determining the Cold data space needs will be straightforward.

TVS should be considered during system planning for the future needs of the system. By including predictions on growth, data types, Business Intelligence data requirements, etc., the acquisition of new systems or upgrades to existing systems can be configured to best take advantage of the flexibility that TVS can provide. This would include provisions for storage growth within cliques and the use of larger capacity drives with possibly excess data space that is later efficiently used for Cold data with TVS.

With TVS it is possible to mix different size disks within a clique. One of the primary use cases is to enable large inexpensive disks to be configured for COLD data, while the smaller, higher performance disks are used for HOT and WARM data.

When designing a storage configuration with multiple device sizes and technologies, the data space and performance provided by each type of storage device must be considered. The data temperature demographic of the system must be examined to determine if there are distinct HOT and COLD portions of data and to estimate the size of each portion of data. It is strongly recommended that you allow Teradata personnel to assess your system’s readiness for TVS with their Assessment Tool before making any decisions regarding TVS implementation.

When is TVS not appropriate?

This TVS product release does not fit in every Teradata solution and is not appropriate for consideration in some systems. The many EDW systems that are focused primarily or solely on operational or Active Data Warehouse environments would contain mostly Hot data. As data ages and cools it would be moved to archive since it no longer offers apparent business value. There’s no benefit seen with TVS in this case if the Cold data is simply no longer used.

Finally, implementing Cold data as a backup to disk solution for history data is not a suitable use for Virtual Storage. The Teradata BAR backup solutions should be considered instead – either the virtual tape disk system or tape libraries. The Teradata BAR options are more appropriate for actual backups which cannot be lost, need to be stored offsite, or require multiple copies. But as an online archive option, for data that may be accessed at some point in time, TVS may offer an option to restoring required data from a backup copy of the data.

The future possibilities of TVS

Although at this time, it is not possible to say when or how TVS will be enhanced beyond the current initial release, there are definitely many possibilities for the future. Among possibilities for expanding the capability of TVS is a greater sharing of data across more disks (referred to below as affinity).

In traditional Teradata Systems, storage units (disks) can only be added in multiples of AMPs. If a system has 16 AMPS, the number of disks that can be added at any one time must be a multiple of 16 (32, 64, etc.), and no disks are shared between AMPs (disks are dedicated to a single AMP). An affinity parameter for a certain storage unit could designate what percentage of that unit is used by a single AMP exclusively. 100% affinity would mean the unit is completely used by a single AMP (i.e. it is not shared), 0% means the unit is completely shared between multiple AMPs, while 50% would mean that half of the unit is used by a single AMP, the other half is shared.

Also a future possibility of TVS is a scenario in which the thermal migration/placement will occur across different drive types or classes. In this potential future scenario Teradata will be able to leverage the speed of very fast storage devices and the capacity of very large disks. This possible enhancement of TVS would provide automatic data migration and placement of data by its temperature across widely different storage elements. The ultra fast Solid State Disks (SSD) could be used with today’s high speed enterprise drives and along with large capacity, low cost, slow speed drives. In this scenario, TVS would place or migrate data to the appropriate drive type depending on the data temperature. The more frequently used data would be placed on the SSD, while the less frequently used data would be positioned on the mid-range speed enterprise drives and the very infrequently used Cold history data would end up on the less expensive, large capacity drives.

An SSD storage medium is not magnetic (like a hard disk) or optical (like a CD) but solid state semiconductor such as battery backed RAM, EPROM or other electrically erasable RAM-like chip such as flash. This provides faster access time than a hard disk, because the SSD data can be randomly accessed in the same time whatever the storage location. The SSD access time does not depend on a read/write interface head synchronizing with a data sector on a rotating disk. The SSD also provides greater physical resilience to physical vibration, shock and extreme temperature fluctuations. SSDs are also immune to strong magnetic fields which could sanitize a hard drive.

Summary

TVS enables flexible configurations of mixed drive capacities within one system and clique. It also enables cost effective and simple expansion of storage in a system without having to add Teradata nodes. It allows the use of mixed storage on a Teradata System. Specifically, disks of different sizes and types can be mixed in an array, and different array models can be mixed in a clique. This allows the system to retain old disks in a new configuration or mix and match larger, lower performance disks with smaller, faster performance storage.

TVS resides between the Teradata Database File System and the physical storage media. When it is chosen to be implemented on a Teradata system, it is the primary layer of Teradata Database storage management and control. Although TVS is a separate storage subsystem feature, it will only work on a Teradata Release 13 (or above) software level. TVS is responsible for allocating storage, keeping track of where data is stored on the physical media, maintaining statistics on data access, optimizing data placement and allowing different types of storage units to be used on the same system.

TVS automatically and transparently migrates and places data on storage by considering its thermal characteristics: Hot, Warm, Cold. It provides effective use of large capacity drives for Cold data storage. Data placement is automatically and transparently optimized by moving most often accessed data (‘hot data’) to faster storage, while moving rarely accessed data (‘cold data’) to slower storage units or shared disks.

Today’s TVS product lays the foundation for the future optimization of storage technology for data warehousing.

TVS is new in Teradata 13.0. TVS enables mixing storage technologies in a clique with intelligent placement and access-frequency based automatic data migration. By allowing for mixed and shared storage, it enables the system storage to be upgraded with greater flexibility and cost effectiveness. The intelligent data placement algorithms of TVS minimize the performance impact of mixed and shared storage configurations. There are new performance considerations. Mixed storage design must be matched to data temperature demographics.

Further reading

http://www.teradata.com/tdmo/v07n02/Tech2Tech/AppliedSolutions/WasteNotWantNot.aspx

http://www.teradata.com/tdmo/v08n04/Tech2Tech/AskTheExperts/Teradata13.aspx

http://www.teradata.com/t/WorkArea/DownloadAsset.aspx?id=1446

http://www.teradata.com/t/newsrelease.aspx?id=11806

10 REPLIES

Re: Teradata Virtual Storage: The new way to manage Multi-Temperature data

Do you have or can you find 'best practice' for capacity planning for a TD Data Architecture please

Is there a TeraData published Capacity Planning guidelines that you can share with me please

Any help will be greatly appreciated
Thanks,
Pranab
pranab_mukherjee@hotmail.com
205-276-2553(Cell)
Enthusiast

Re: Teradata Virtual Storage: The new way to manage Multi-Temperature data

Is there a way for the dba to see what data is on "cold storage" at any point in time, so it can be archived using BAR?
Enthusiast

Re: Teradata Virtual Storage: The new way to manage Multi-Temperature data

It also sounds like TVS can be implemented on just our existing homogenous disks, and TVS will migrate the data to faster and slower parts of these disks. Is there a reasonable benefit to doing this?
Teradata Employee

Re: Teradata Virtual Storage: The new way to manage Multi-Temperature data

As stated in the article,"It is strongly recommended that you allow Teradata personnel to assess your system’s readiness for Teradata Virtual Storage with their Assessment Tool before making any decisions regarding Teradata Virtual Storage implementation." Let them help you determine whether or not your system would benefit from TVS.
Viewpoint has a TVS Monitor portlet that allows you to view statistics on data temperature and storage
grade of cylinders allocated in Teradata Virtual Storage (TVS). Statistics reflect current
relationships between data temperature and storage grade, and historical trends in the
management of storage grades based on data temperature.
Based on those statistics, you could potentially identify data that could be archived, although just because data is "cold" doesn't mean it is not being utilized at all.
Enthusiast

Re: Teradata Virtual Storage: The new way to manage Multi-Temperature data

Hi Mark,

I've worked on different versions of Viewpoint (13,13.10, 13.11) but we didn't enable the TVS portlet in any one of the versions. After reading your article I would like to get advantages of this portlet with version new version (VP 13.12) that we are going to implement soon. Before that i would like to know if there are any downsides with enabling this portlet or not.
As its going to collect stats about the hotness of data is that going to lead to any overhead issues?

Please confirm,

Thanks,
Geeta.
Teradata Employee

Re: Teradata Virtual Storage: The new way to manage Multi-Temperature data

This collector is hard to quantify, since it uses a user defined function to pull TVS data (SYSLIB.GETTVSUDFVIEWPOINTSUMMARY). The default collection frequency of this data collector is once every hour, so I wouldn’t think it would be high impact.
Enthusiast

Re: Teradata Virtual Storage: The new way to manage Multi-Temperature data

Thank you Mark.
Enthusiast

Re: Teradata Virtual Storage: The new way to manage Multi-Temperature data

hello Mark,

we have EDWs for PROD systems and Appliances for NON-PROD systems (Dev/Test..). I was comparing the TVS options (from ctl--->tvs switch=on). For the PROD systems we have  "ALLOCATION METHOD=ONE_DIMENTIONAL" and "Migration=ON(under Migrator options)".And

for the NON-PROD systems we have  "ALLOCATION METHOD=TRADITIONAL_TERADATA" and "Migration=OFF(under Migrator options)".

My questions:

1) is the Allocation Method=One_Dimentional and Migrator=ON by default for the Enterprise class systems (6650,6700..). And the appliance systems have migrator=off with Traditional_Teradata as the allocation method by default?

2) If we have migrator=on for enterprise class, that means the tvs is migrating the cylinders based on the temparture (hot/cold) to concern arrays (on ssd/hdds).  And before turning on the migrator, the system must be completely reviewd by Teradata personnel to check the TVS necessacity.Is it true statement?

Please share your inputs.

Teradata Employee

Re: Teradata Virtual Storage: The new way to manage Multi-Temperature data

Geeta - I think your questions regarding the settings for the PROD vs. NON-PROD systems are best directed to your Teradata Customer Support representative.  The CSR should be able to tell you what the defaults are.  You can also contact the Teradata Global Support center if you are having difficulties getting answers to your questions.

Regarding the check for TVS necessity, it would seem to be a good idea, if not an absolute requirement.  I believe there is an evaluation tool which can be utilized to determine the need for TVS on a system.  Again, your CSR is the one to consult on this.  This type of evaluation would be very system specific, so unless it is no longer available, or would upset the PROD system performance and schedule too much, it might be a good idea to try it and verify that the TVS is actually needed on the system.