Temperature of Data- need help in understanding the concept
The Teradata Database channel includes discussions around advanced Teradata features such as high-performance parallel database technology, the optimizer, mixed workload management solutions, and other related technologies.
Re: Temperature of Data- need help in understanding the concept
TVS is the component that manages temperature. TVS has two major functions.
Monitoring: When TVS is present on the system it monitors the access frequency of every storage allocation unit. It maintains a list of allocation units ordered by access frequency.
Migration: If the physical platform on which Teradata is running has the necessary hardware components - multiple storage types (e.g. SSD and spinning disk and/or large memory) and TVS is enabled then the migrator will move data between storage tiers according to the sorted list from the monitor. The most frequently accessed data (hottest) will be placed on the fastest persistent storage components (e.g. SSD) and the least frequently accessed data will be placed on the slowest persistent storage (inner tracks of slowest disks). This migration process is continuous and automatic. It responds to changes in the access pattern of the data - moving hotter data up and cooler data down the tiers of storage.
There are no absolute temperatures. Temperature is determined only by the relative position of a storage allocation unit in the sorted list.
If sufficient memory is available and Teradata Intelligent Memory(TIM) is turned on then the migrator will also pin the very top of the access list in memory - again responding to changes in the access patterns. Memory is used as a cache, the data in TIM will also have a home location on persistent storage devices.
On a new system when all data has just been loaded, all data effectively has the same temperature. It can be thought of as all data having a temperature of warm. As the workloads are deployed on the system, the monitor will learn the access patterns of the data and the migrator will begin to move data appropriately.
Statistics are completely independent of temperature. Stats collection is not necessary or utilized for determining temperature. Stats do not need to be recollected as temperature changes. Stats should be collected as necessary for getting good plans from the optimizer, not with any consideration of data temperature.
TASM interacts indirectly with temperature in a couple ways. Often the higher priority workloads on the system have higher frequency/concurrency queries - this in turn will result in higher access frequencies for the data which will result in the data being higher on the access list and therefore hotter. In the latest releases, the highest priority workloads also cause an adjustment to the access frequency computations. If the accesses come from the highest priority work, then those accesses get higher weight than lower priority work. This raises the probability that the data accessed by high priority workload will be hot and therefore will reside on the fastest part of the storage.
When a new object is loaded, the default temperature will be warm. This can be overridden with a query band that specifies the desired initial temperature. Eg if the new data will be infrequently accessed, a query band can be used to load it with a default temperature of cold.
The list from the monitor also is used to drive the Temperature Based Block Level Compression (TBBLC). The least frequently accessed allocations units are automatically compressed if TBBLC is enabled.