Oracle BI Enterprise Edition (OBIEE) Configuration and Integration with Presto

Applications
Applications covers the world of Teradata apps, including apps offered by Teradata (such as TRM or DCM), as well as best practices and methodologies for building your own Teradata-connected apps.
Teradata Employee

Oracle BI Enterprise Edition (OBIEE) Configuration and Integration with Presto

The purpose of this article is to describe to the implementer(s) how to configure and integrate Oracle BI Enterprise Edition (OBIEE) with Presto. This article assumes you have a rudimentary understanding of OBIEE, Presto, Hadoop and database terminology.

Contained within this article is information on version compatibility and the steps to integrate OBIEE with Presto to implement an integrated analytic solution. This includes configuring Presto ODBC connectivity from OBIEE and relevant OBIEE feature table settings. 

Presto - A Powerful SQL Engine for Hadoop and More!

Presto is a distributed SQL query engine designed to query large data sets distributed over one or more heterogeneous data sources. In Italian, “Presto” means fast. In the tech world, it means an open-source distributed SQL query engine for Apache™ Hadoop® that runs interactive analytic queries against data sources of all sizes. Through a single query, data is accessed where it resides. Typically, this means data in a Hadoop Distributed File System (HDFS). However, unlike other SQL on Hadoop engines, Presto can query data sources such as Apache Cassandra™, relational databases or even proprietary data stores.

Created by Facebook to meet the analytic needs of extremely large data-driven organizations, Presto’s pure memory-based architecture is built for speed. It performs jobs in memory where all stages of a task are pipelined to enable high-speed query capabilities. All data transfers in the worker nodes occur memory-to-memory, with no disk I/O.

Although this provides extremely high-speed processing, the caveat is that everything must fit in memory, which must be taken into account when looking at the type of workloads to be performed. For example, long-running ETL-type workloads on Hadoop might perform poorly. However, more interactive-type workloads are well suited for Presto.

Presto is used by a community of well known, respected technology companies such as Facebook, NetFlix, Twitter, Groupon and others.

Teradata is the only major vendor currently supporting and contributing to Presto. Teradata certified Hadoop distributions include Cloudera Distribution for Hadoop, Hortonworks Data Platform, IBM Big Insights and Teradata Hadoop Appliance, with plans for others.

Presto complements Teradata® QueryGrid™ within the Teradata Unified Data Architecture™. It serves as a key engine to enable interactive querying against Hadoop, while Teradata QueryGrid allows queries to be initiated from the Teradata Database and the Teradata Aster Database, all through a common SQL protocol.

Scope/Limitations

OBIEE can access Presto via QueryGrid™ or by directly connecting with ODBC. This document describes how to set up OBIEE for a direct connection to Presto.  There are 2 versions of ODBC drivers available.  One version is open source and is available on GitHub and the Presto website at:

https://prestodb.io/resources.html#odbc

The other version is Teradata certified and there will be specific feature content that Teradata will release that hasn’t been incorporated into the open source base. These features are driven by Teradata customer requirements.  It is rigorously tested and only contains stable functionality.  It can be downloaded at:

http://teradata.com/presto

OBIEE was tested with the Teradata certified drivers and the Teradata Presto for Hortonworks HDP Sandbox.

Conceptually there are three levels of OBIEE and Presto Integration:

 

Indirect   

OBIEE connected to Teradata which is connected to Presto via query grid

Custom

OBIEE connected to Presto directly with ODBC drivers.  Customized OBIEE feature table to support presto capabilities. To be used as-is in the field with no formal support from Oracle.

Native

Oracle develops and incorporates Presto as a certified, supported query engine/data source

 

The initial integration was focused on delivering the basic “custom” level of integration, namely providing OBIEE report, dashboard and ad-hoc generated SQL access to Presto for retrieving data (information) stored in Hadoop HDFS.

 

As with all OBIEE implementations, OBIEE feature settings for Presto can be modified to meet customer specific needs.

Version Compatibility

Integration testing of OBIEE and Presto was completed using the Teradata certified distribution of Presto version .148t. Older versions of presto were tested for connectivity, but not for feature/functionality. The corresponding version of the Presto ODBC driver was used.

OBIEE 12.2.1 installed on a Windows server was used in the integration testing. Other versions of OBIEE should work with Presto but they have not been rigorously tested.

OBIEE and Presto Configuration Presto

This integrated solution requires Presto and Presto ODBC. Presto can be installed on any normally configured Hadoop cluster.  The details of hardware and software requirements for Presto can be found in the Presto documentation:

http://teradata.github.io/presto/docs/148t/overview.html

The Presto ODBC driver associated with the Presto release needs to be installed on the OBIEE server.   It is important to make sure that the OBIEE Server platform (OS and chipset) is supported by the Presto ODBC driver.

OBIEE

This integrated solution requires OBIEE and the associated products that accompany its implementation and use as a general front-end BI tool.

There are no unique OBIEE add-ons required for accessing Presto.

Install ODBC on OBIEE Server

Presto ODBC needs to be installed on the OBIEE server. The packages are located here: 

http://teradata.com/presto

The driver download includes documentation that describes the install procedure. The presto data source name should be configured. For the integration testing the Presto system was added to the /etc/hosts file so that the system can be accessed by name.  The IP address could also be used in the “host” field.

 

p1.png

Connectivity can be checked with the “test” button.

 

p2.png

Configure the Presto data source in OBIEE

As of the writing of this document OBIEE does not yet have native integration with Presto. If direct integration was in place, there would be a “Presto” option listed on the data source definition drop down.  If that is not yet available, “ODBC Advanced” should be selected for the data source type.

 

p3.png

Choosing “ODBC Advanced” determines which feature table OBIEE will use for determining the default function shipping to the Presto engine.

 

p4.png

Feature Table Settings

During Presto certification, it was determined that the following features need to be disabled:

  1. Null values sort first
  2. Ascii
  3. Insert
  4. Repeat
  5. Ifnull
  6. Float
  7. Convert
  8. Space
  9. truncate

The following features need to be enabled:

  1. Right Outer Join
  2. Full Outer Join
  3. Nested Outer Join
  4. Union
  5. Union All
  6. Rank
  7. Date literal
  8. Time literal
  9. Date time literal
  10. Interval literal
  11. Now
  12. Position
  13. Mod
  14. Rand
  15. Cosh
  16. Tanh
  17. Concat
  18. Like escape clause
  19. Ntile
  20. Stddev samp
  21. Stddev pop
  22. Nullif
  23. Nulls sort direction in orderby
  24. Dense rank
  25. Coalesce
  26. Derived tables
  27. Predicate_scalar
  28. Comparison subquery
  29. Create as select
  30. In subquery
  31. concat null returns null
  32. Intersect

Connection Pool

Care must be taken when setting the maximum number of connections for the Presto connection pool. The Presto engine stopped running when the number of connections was set high and OBIEE was sending many simultaneous queries. For the small Hadoop Sandbox that was used for testing 8 simultaneous queries was the maximum.

Limitations

Once the basic configuration is complete OBIEE and Presto should operate effectively together. There are no limitations when importing metadata into OBIEE.

Learning’s

Various points of interest came to light during testing. These are recapped here:

  • Previous versions of Presto supported very limited data types. Version .148t improved it dramatically.
  • Presto does not yet support referential integrity, primary keys, not null, and defaults.
  • Insert statement doesn’t allow column names: insert into table (col1, col2, col3) values (v1,v2,v3);
  • Loading data with Presto is challenging. It is meant as a query engine, not a loader. Using Hive or another method was a lot easier.
  • While “grouping sets” are implemented in Presto they don’t support functions on the columns used in the set.
  • While Presto supports the “with” syntax, it has a lot of limitations. For example the following nested with fails:

            WITH

            SAWITH0 AS (select sum(T207."Revenue") as c1,

                 T267."ProductName" as c2,

                 T267."ProductID" as c3

            from

                 dbo."Products" T267,

                 dbo."sales_prod_emp" T207

            where ( T207."ProductID" = T267."ProductID" and T267."ProductName" like 'C%' and upper(T267."ProductName") like '%AN%' )

            group by T267."ProductName", T267."ProductID"),

           SAWITH1 AS (select D1.c2 as c1,

                  D1.c2 as c2,

                  D1.c1 as c3,

                  D1.c3 as c4

           from

                  SAWITH0 D1)

           select D1.c1 as c1,

                  D1.c2 as c2,

                  D1.c3 as c3

           from

                  SAWITH1 D1

           order by 1, 2

 

  • While Presto supports the {fn} syntax, it does not support it in all cases. So while this is the native way to get upper-case:

             SELECT upper(Name) FROM Customers

 

        There is also a way to get some ODBC drivers to translate a generic uppercase function to the database specific syntax:

             SELECT {fn UCASE(Name)} FROM Customers

 

         Presto does not support the {fn} syntax in union statements, for example:

              select distinct T262.CategoryName as c1

              from

                  Categories T262

              where ( T262.CategoryName like 'C%' )

              union all

              select distinct T267.ProductName as c1

              from

              Products T267

              where  T267.ProductName like 'C%' and {fn ucase(T267.ProductName)} like '%AN%'

 

       Presto also does not support the {fn} syntax in “with” statements:

              WITH

             SAWITH0 AS (select T60.FirstName as c1,

                   T60.EmployeeID as c2,

                   sum(T208.Revenue) as c3

             from

                   dbo.Employees T60,

                   dbo.sales_prod_emp T208

             where ( T60.EmployeeID = T208.EmployeeID )

             group by T60.FirstName, T60.EmployeeID)

             select D1.c2 as c1,

                {fn left(D1.c1 , 4)} as c2,

                D1.c3 as c3,

                'constant' as c4

            from

            SAWITH0 D1

            order by 3 desc

 

1 REPLY
New Member

Re: Oracle BI Enterprise Edition (OBIEE) Configuration and Integration with Presto

Hi,

 

I could successfully downloaded the ODBC driver and installed in Linux machine. I could connect from OBIEE 12.2.1.1 version. But very frequently I am getting the below error when ever I try to refresh the report in OBIEE.

I am using presto in AWS EMR cluster.

 

Error Codes: OPR4ONWY:U9IM8TAC:U9IM8TAC:U9IM8TAC:U9IM8TAC:OI2DL65P
State: HY000. Code: 10058. [NQODBC] [SQL_STATE: HY000] [nQSError: 10058] A general error has occurred. (HY000)
State: HY000. Code: 43113. [nQSError: 43113] Message returned from OBIS. (HY000)
State: HY000. Code: 16001. [nQSError: 16001] ODBC error state: S1000 code: 0 message: [DataDirect][ODBCCUR lib] General error: Unable to read from file buffer. (HY000)
Please let me know the work around for the same.
 
Thank You,
Ambika