SEE ATTACHMENT FOR FULL PAPER
Abstract – The volume, velocity, and variety of Big Data necessitate the development of new and innovative data processing software. A multitude of SQL implementations on distributed systems have emerged in recent years to enable large-scale data analysis. User-Defined Table operators (written in procedural languages) embedded in these SQL implementations are a powerful mechanism to succinctly express and perform analytic operations typical in Big Data discovery workloads. Table operators can be easily customized to implement different processing models such as map, reduce and graph execution. Despite an inherently parallel execution model, the performance and scalability of these table operators is greatly restricted as they appear as a black box to a typical SQL query optimizer. The optimizer is not able to infer even the basic properties of table operators, prohibiting the application of optimization rules and strategies. In this paper, we introduce an innovative concept of “Collaborative Planning”, which results in the removal of redundant operations and a more optimal rearrangement of query plan operators. The optimization of the query proceeds through a collaborative exchange between the planner and the table operator. Plan properties and context information of surrounding query plan operations are exchanged between the optimizer and the table operator. Knowing these properties also allows the author of the table operator to optimize its embedded logic. Our main contribution in this paper is the design and implementation of Collaborative Planning in the Teradata Aster 6 system. Using real-world workloads, we show that Collaborative Planning reduces query execution times as much as 90.0% in common use cases, resulting in a 24x speedup.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.