Database users are increasingly becoming more comfortable with storing their data in database systems located in public or private clouds. For example, Amazon RDS (relational database service) is used by approximately half of all customers of Amazon’s AWS cloud. Given that AWS has over a million active customers, this implies that there are over a half a million users that are willing to store their data in Amazon’s public cloud database services.
A key feature of cloud computing --- a feature that enables efficient resource utilization and reduced overall costs --- is multi-tenancy. Many different users, potentially from different organizations, share the same physical resources in the cloud. Since many database-backed applications are not able to fully utilize the CPU, memory, and I/O resources of a single server machine 24 hours a day, database users can leverage the mutli-tenant nature of the cloud in order to reduce costs.
The most straightforward way to implement database multi-tenancy in the cloud is to acquire a virtual machine in the cloud (e.g. via Amazon EC2), install the database system on the virtual machine, and load data into it and access it as one would any other database system. As an optimization, many cloud providers offer specialized virtual machines with the database preinstalled and preconfigured in order to accelerate the process of setting up the database and making it ready to use. Amazon RDS is one example of a specialized virtual machine of this kind.
The “database system on a virtual machine” approach is a clean, elegant, and general way to implement multi-tenancy. Multiple databases running on different virtual machines can be mapped to the same physical machine, with negligible concern for any security problems arising from the resulting multi-tenancy. This is because the hypervisor effectively shields each virtual machine from being able to access data from other virtual machines located on the same physical machine.
For general cloud environments such as Amazon AWS, this general approach of achieving database multi-tenancy is a good solution, since it dovetails with the EC2 philosophy of giving each user his own virtual machine. However, for specific database-as-a-service / database-in-the-cloud solutions, supporting multi-tenancy via installing multiple database instances in separate virtual machines is inefficient for several reasons. First, storage, memory, and cache space must be consumed for each virtual machine. Second, the same database software must be installed on each virtual machine, thereby duplicating the storage, memory and cache space needed for the redundant instances of the same database software. These redundant copies also reduce instruction cache locality, since even if the same parts of the database system codebase are being accessed by different instances, since each instance has its own separate codebase, other instances running on other virtual machines that access the same part of the code cannot benefit from the fact that it is already in the instruction cache of a different virtual machine.
To summarize the above points:
(1) Allowing multiple database users to share the same physical hardware (“multi-tenancy”) helps optimize resource utilization in the cloud, and therefore reduce costs.
(2) Secure mutli-tenancy can be easily achieved via giving each user a separate virtual machine and mapping multiple virtual machines to the same physical machine.
(3) When the different virtual machines are all running the same OS and database software, the virtual machine approach results in inefficient redundancy.
If all tenants of a multi-tenant system are using the same software, it is far more efficient to install a single instance of that software on the system, and allow all tenants to share the same software instance. However, a major concern with this approach is security: for example, in a database system, it is totally unacceptable for different tenants to have access to each other’s data. Even metadata should not be visible across tenants --- they should not be aware of each other’s table names and data profiles. In other words, each tenant should have a view of the database as if they are using an instance of that database installed and prepared specifically for that tenant, and any data and metadata associated with other tenants should be totally invisible.
Furthermore, even the performance of database queries, transactions, and other types of requests should not be harmed by the requests of other tenants. For example, if tenant A is running a long and resource-intensive query, tenant B should not observe slow-downs of the requests it is concurrently making of the database system.
Teradata’s recent announcement of its secure zones feature is thus a major step towards a secure, multi-tenant version of Teradata. Each tenant exists with its own “Secure Zone”, and each zone has its own separate set of users that can only access database objects within that zone. The view that a user has of the database is completely local to the zone in which that user is defined --- even the database metadata (“data dictionary tables” in Teradata lingo) is local to the zone, such that user queries of this metadata only return results for the metadata associated with the zone in which the user is defined. Users are not even able to explicitly grant permissions to view database objects of their zone to users of a different zone --- each zone is 100% isolated from the other secure zones.
Figure 1: Secure zones contain database uses, tables, profiles, and views
A key design theme in Teradata’s Secure Zones feature is the separation of administrative duties from access privileges. For example, in order to create a new tenant, there needs to be a way to create a new secure zone for that tenant. Theoretically, the most straightforward mechanism for accomplishing this would be via a “super user” analogous to the linux superuser / root that has access to the entire system and can create new users and data on the system at will. This Teradata superuser could then add and remove new secure zones, create users for those zones, and access data within those zones.
Unfortunately, this straightforward “superuser” solution is fundamentally antithetical to the general “Secure Zones” goal of isolating zones from each other, since the zone boundaries have no effect on the superuser. In fact, the presence of a superuser would violate regulatory compliance requirements in certain multi-tenant application scenarios.
Therefore, Teradata’s Secure Zones feature includes the concept of a “Zone Administrator” --- a special type of user that can perform high level zone administration duties, but has no discretionary access rights on any objects or data within a zone. For example, the Zone Administrator has the power to create and drop zones, and to grant limited access to the zone for specific types of users. Furthermore, the Zone Administrator determines the root object of a zone. However, the Zone Administrator cannot read or write that root object, nor any of its descendants.
Analogous to a Zone Administrator is a special kind of user called a “DBA User”. Just as a Zone Administrator can perform administrative zone management tasks without discretionary access rights in the zones that it manages, a DBA User can perform administrative tasks for a particular zone without superuser discretionary access rights in that zone. In particular, DBA Users only receive automatic DDL and DCL rights within a zone, along with the power to create and drop users and objects. However, they must be directly assigned DML rights for any objects within a zone that they do not own in order to be able to access them. Thus, if every zone in a Teradata system is managed by a DBA User, then the resulting configuration has complete separation of administrative duties from access privileges --- the Zone Administrator and DBA Users perform the administration without any automatic discretionary access rights on the objects in the system.
The immediate use case for secure zones is Teradata’s new “Software Defined Warehouse” which is basically a Teradata private cloud within an organization. It consists of a single Teradata system that is able to serve multiple different Teradata database instances from the same system. If the organization develops a new application that can be served from a Teradata database, instead of acquiring the hardware and software package that composes a new Teradata system, the organization can instead serve this application from the software defined warehouse. Multiple existing Teradata database instances can also be consolidated into the software defined warehouse.
Figure 2: Software-Defined Warehouse Workloads
The software defined warehouse is currently intended for use cases where all applications / database instances that it is managing belong to the same organization. Nonetheless, in many cases, different parts of an organization are not allowed access to data for other parts of that organization. This is especially true for multinational or conglomerate companies with multiple subsidiaries where access to subsidiary data must be tightly controlled and restricted to users of the subsidiary or citizens of a specific country. Therefore, each database instance that the software defined warehouse is managing exists within a secure zone.
In addition to secure zones, the other major Teradata feature that makes efficient multi-tenancy possible is Teradata Workload Management. Without workload management, it is possible for system resources to get hogged by a single database instance that is running a particularly resource intensive task, while users of the other instances see significantly increased latencies and overall degraded performance. For the multiple virtual-machine implementation of the cloud mentioned above, the hypervisor implements workload management --- ensuring that each virtual machine gets a guaranteed amount of important system resources such as CPU and memory. Teradata’s “virtual partitions” works the same way --- the system resources are divided up so that each partition is guaranteed a fixed amount of system resources. By placing each Teradata instance inside its own virtual partition, the Teradata workload manager can thus ensure that the database utilization of one instance does not affect the observed performance of other instances.
When you combine Teradata Secure Zones and Teradata Workload Management, you end up with a cloud-like environment, where multiple different Teradata databases can be served from a single system. Additional database instances can be created “on demand”, backed by this same system, without having to wait for procurement of an additional Teradata system. However, this mechanism of “cloudifying” Teradata is much more efficient that installing the Teradata database software in multiple different virtual machines, since all instances are served from a single version of the Teradata codebase, without redundant operating system and database system installations.
Since I am not a full-time employee of Teradata and have not been briefed on future plans for Teradata in the cloud, I can only speculate about the next steps for Teradata’s plans for cloud. Obviously, Teradata’s main focus for secure zones and virtual partitions have been the software-defined warehouse, so that organizations can implement a private cloud or consolidate multiple Teradata instances onto a single system. However, I do not see any fundamental limitations to prevent Teradata from leveraging these technologies in order to build a public Teradata cloud, where Teradata instances from different organizations share the same physical hardware, just like VMs from different organizations share the same hardware in Amazon’s cloud. Whether or not Teradata chooses to go in this direction is likely a business decision that they will have to make, but it’s interesting to see that with secure zones and workload management, they already have the major technological components to proceed in this direction and build a highly-efficient database-as-a-service offering.
 There is a concept of a special type of user called a “Zone Guest”, which is not associated with any zone, and can have guest access to objects in multiple zones, but the details of this special type of user is outside the scope of this post.
Daniel Abadi is an Associate Professor at Yale University, founder of Hadapt, and a Teradata employee following the recent acquisition. He does research primarily in database system architecture and implementation. He received a Ph.D. from MIT and a M.Phil from Cambridge. He is best known for his research in column-store database systems (the C-Store project, which was commercialized by Vertica), high performance transactional systems (the H-Store project, commercialized by VoltDB), and Hadapt (acquired by Teradata). http://twitter.com/#!/daniel_abadi.