Physical I/O utilization is not as straightforward to measure as CPU utilization. For one thing, the I/O utilization metrics that exist, mainly in the ResUsage table, are not always easy to interpret. And I/O demand is not usually even throughout a logging interval. It's common for demand for I/O to swing from high to low rapidly and unpredictably, even within a single second.
Many factors influence I/O utilization. For example, the effectiveness of caching on the nodes and in the arrays, and the degree that I/O requests can be grouped at the disk level.
If I/O metrics are being calculated over intervals of ten minutes, or even over one minute, the actual I/O utilization at any point in time could be misrepresented. In addition, there are different levels of granularity to consider when looking at ResUsage I/O metrics: Do you consider system-level usage, node-level usage, or device level usage?
This posting describes two available approaches that can be used to better understand platform I/O utilization today: OutReqTime and IOTAs. Both are available in ResUsage tables.
OutReqTime is a simple reporting of the total time within a ResUsage logging interval that a disk device had an outstanding I/O request. OutReqTime is reported in centiseconds. By dividing the OutReqTime value in each logging interval by the number of centiseconds in the interval, you can derive a percent I/O busy number for the logging period.
While OutReqTime data is considered accurate, keep in mind that OutReqTime data is only produced at the device level. OutReqTime data appears in the ResUsageSldv table (which carries logical device information that comes from the storage devices). Once you have accessed device-specific OutReqTime metrics, group those numbers into device type (SSD vs. HDD, for example) before aggregating up to the node level, or even the system level. This is important because I/O demand often differs between SSD and HDD device types.
Each logged row in the ResUsageSLDV table represents metrics for one logical disk device. Each row includes an LDV type field that identifies whether the device type is SSD or HDD. If the device is an SSD type, this type field may further identify the device as 'RSSD' for read-intensive solid state or 'WSSD' for write-intensive solid state. The row also includes a device ID field to indicate which drive on the node is being reported. An SLDV row also carries the Node Type and Node ID, to make aggregating to the node level easy.
Here are some things to think about when using OutReqTime:
Below is subset of rows and columns taken from the ResUsageSLDV table that illustrates the calculation of I/O utilization based on the OutReqTime field. On this small test system there were only nine devices on a single node, all HDDs. A low level of activity was going on at the time this data was logged.
Using the OutReqTime field, you can assess average I/O utilization per node by device type, and derive an average I/O utilization number per node. The right-most column performs that calculation.
There is one caveat when using OutReq numbers when assessing how close you are to maximum I/O bandwidth. The OutReq metric does not consider how saturated an individual device is, or how much work the drive can do in a unit of time. OutReq reports the device as busy if it has just a single I/O outstanding. But having one I/O outstanding does not mean that the device is necessarily 100% busy.
For that reason, when OutReq calculations indicate 100% I/O utilization, there still may be some capacity remaining if more I/Os were to be issued. More information about the device and what it is doing, beyond just OutReq numbers, is needed to truly understand saturation.
I/O Token Allocations (IOTAs) have been used since Teradata Database 14.10 beginning with SLES 11 systems. The intent of IOTAs was originally to enforce the I/O side of Workload Management Capacity on Demand (WM COD) and other SLES 11 hard limits.
IOTAs are used in calculations that take place inside the database, and they were not designed for ease of interpretation by a system administrator. Not only are they sometimes difficult to interpret, IOTA statistics are not always externalized in the ResUsage tables, depending on the hardware platform. There have been several cases where 6800 platforms did not externalize IOTAs. That issue has been resolved on the newer platforms.
To give you a sense of what IOTAs do internally, this section describes what IOTAs are and what role they play in enforcing I/O limits with WM COD.
The I/O side of WM COD is enforced using IOTAs. Incoming I/O requests are prioritized, and their cost is calculated before they are issued. Costing takes place in a special module that resides on each disk drive. I/O costing is based on the physical bandwidth that is expected to be transferred when an I/O is executed. Different types of I/O (read vs. write) and differently-sized data blocks have different I/O costs assigned to them.
In order to enforce COD at the prescribed level, internal units called "tokens" are dispersed. Each token represents a unit of I/O bandwidth. A set number of these tokens are made available to the disk drive at intervals of a few milliseconds. Only when enough tokens have been accumulated to match the cost a particular I/O request, will that request be executed.
Tokens (or groups of tokens referred to as IOTAs) are accumulated on a periodic time basis into a WM COD bucket that exists on each drive. Once there are enough tokens in the bucket for an I/O, the required tokens are removed from the bucket as payment for the I/O, and the I/O is released. The number of tokens issued per interval is determined based on the WM COD setting.
If WM COD has been activated, tokens are only issued to a disk drive if there is an I/O that is ready to run. Consequently, the number of tokens does not build up over time in the absence of I/O demand. You use them at the time they are issued, or you lose them.
When there is an I/O ready to run, it will wait until enough tokens have arrived in the bucket to allow that I/O to be released. When that happens, the number of tokens that reflect the expected bandwidth of the I/O are taken out of the bucket, and any remaining tokens are discarded, unless there are additional I/Os waiting to run on that device.
If the number of tokens distributed to the bucket on a given device has been exhausted, and there are still I/Os waiting to run, those I/O requests will have to wait until the next COD enforcement interval for more tokens to be provided. Tokens are issued in such a way that total I/O bandwidth from that disk drive will conform to the COD settings that have been established.
IOTAs and ResUsageSPS
In systems that do not have WM COD defined, tokens are used solely for statistical purposes, not for releasing an I/O.
Whether or not you have WM COD enabled, tokens are available for viewing in some of the ResUsage tables. While OutReqTime may currently be your preferred method of assessing I/O utilization, the IOTA columns may be useful in some circumstances, if your platform supports IOTA value externalization.
Consider the ResUsageSPS table, which carries usage metrics by Priority Scheduler workload.
The two most useful fields related to tokens that appear in ResUsageSPS are "FullPotentialIota" and "UsedIota". When you manipulate them appropriately you can derive the percent of the total system I/O bandwidth each workload on a node consumed during that logging interval.
The FullPotentialIota field reports the maximum IOTAs available across all disks on the node. Since it is a node-level metric, be mindful when you look at ResUsageSPS output that the FullPotentialIota column will be reported as the same number for each workload on that node, just as the NodeID field is carried on each SPS table row. FullPotentialIota is actually a single value but is repeated in each SPS table row to make it easy to perform calculations.
This single-node excerpt of output from ResUsageSPS was captured during an intentionally very I/O intensive test where four workloads were active, one in each of the four Timeshare access levels. Notice that FullPotentialIota is the same value for all the workloads on the node. That is because there was no WM COD active on this system.
UsedIota, the other field shown above, reports the number of IOTAs consumed by the four different workloads in a single logging interval. To translate these two IOTA fields into a percent of I/O utilization for each workload, divide Used by Potential: UsedIota / FullPotentialIota.
The next graphic illustrates the result of that calculation.
In this example, four workloads were forced to perform very I/O-intensive work. This level of even I/O demand is not likely to exist when you examine IOTA columns in your ResUsage tables. Each of the four SLES 11 Timeshare access levels had a single workload active. The four access levels have a priority difference of 1:2:4:8, which correlates to the differences in the I/O utilization among the four workloads reported in the output above.
The ResSpsView does this calculation of Used / FullPotentail for you, in a field named UsedFullPotentialByWD.
IOTAs and ResUsageSPMA
ResUsageSPMA offers a similar view on IOTAs as ResUsageSPS, with or without WM COD. In contrast to the SPS table which reports a row for each workload on a node, the SPMA table produces a single row for the entire node per logging interval.
You can derive system I/O utilization numbers from the SPMA table in a similar way as was shown for the SPS table, using either the ResUsageSPMA table or its views. For example, you can use this calculation: SPMA_UsedIota / SPMA_FullPotentialIota. Below is some output from ResSpmaView that shows this calculation.
In a multi-node system, when you divide UsedIota by FullPotentialIota you will be unlikely to see 100% as a result, even if the node's devices are completely saturated. This is due to a characteristic of how FullPotential is calculated at the node level in current releases.
FullPotential tends to over-report the number of tokens at the node level. This over-reporting comes about because a node will report the FullPotential from all the devices that it sees within the clique that it belongs to. Since nodes have pathways to all devices within the clique for availability purposes, beyond the devices used by the AMPs on the node, this can inflate the node's FullPotential reported number. When FullPotentia at the node level is inflated, this translates to a lower than accurate UsedIota-as-a-%-of-FullPotentialIota metric.
Using ResUsageSLDV, more accurate FullPotential numbers will be reported. In SLDV, IOTAs are reported at the device level. The FullPotential for an individual device will be accurate. The UsedIota for devices that are not being used for the database will be effectively zero.
Comparing IOTAs and OutReq Metrics
In this section we will look at two examples of how OutReq and IOTA metrics look side by side, taken from the ResUsageSLDV table.
In this first example, data is taken from a very small test system that had very low I/O demand at the time of the sample. Only a single workload was active and all device types were HDDs.
The two right-most columns in the table above are calculations based on the other columns in the table. For this particular test you can see a reasonably close correlation between the I/O usage percent drawn from the OutReqTime columns and the usage percent derived from IOTA columns. This may or may not be the case if you make similar comparisons.
This second example shares an excerpt of ResUsageSLDV output from a real-world production system that is performing a mix of work. These selected rows come from the same logging interval and are taken from disk devices that are on the same node. Only rows are from SSD devices are being illustrated.
In this output you can see that the the SSD devices show consistency between the OutReq and the IOTA metrics.
While OutReqTime may be easier to understand today and is recommended for use by Teradata Engineering on current platforms, IOTAs are the future direction for understanding I/O utilization. OutReqTime is susceptible to being influenced by the type of work running on the platform and other variables, and as a result can be more easily distorted. Once you become familiar with using them, IOTAs will provide a more consistent view of I/O usage across both on-premises and cloud implementations.
IOTAS were designed as an internal mechanism to enforce the I/O side of capacity on demand and other SLES 11 hard limits within the disk subsystem. For that reason, they may be difficult to interpret, and for some hardware platforms they are not even included in ResUsage output. That said, some of the IOTA fields that appear in ResUsageSPS and ResUsageSPMA may be helpful for assessing workload I/O utilization, or I/O usage at the node level.
One other advantage of using IOTAs has to do with an IOTA-related field called IotaCodPotential. When COD is enabled on the platform then tokens that get reported in this column are used for internal purposes in order to control the I/O requests in a way that honors the WM COD setting. This metric does not lend itself to general purpose reporting, nor it is easy for someone without special internal knowledge to gain insights from it. But IotaCodPotential does accurately capture the influence of WM COD and other hard limits on I/O utilization, which the OutReq metric cannot accomplish.
Before deciding which of the two I/O utilization options is for you, make your own comparison of how I/O utilization appears from the IOTA perspective compared to the OutReq perspective.
In future releases, IOTAs will be playing a more critical role as a basis for COD-like behavior on IntelliFlex platforms, and as a means to compare I/O capabilities across systems. Even if you currently rely on OutReqTime for your day-to-day analysis, now's a good time to get familiar with IOTAs as well.