Unable to impersonate the logged user through Presto

Presto

Re: Unable to impersonate the logged user through Presto

Check the below URL

https://prestodb.io/docs/current/connector/hive-security.html

 

You will find the following statement:

Impersonation Accessing the Hive Metastore

Presto does not currently support impersonating the end user when accessing the Hive metastore.

 

Following is my understanding (please correct if I am wrong).

Presto queries into hive go through 3 steps.
1. Get Metadata of the table 

2. Verify the external file location (external location of the data if the hive table was built with external location)

3. Fetch data

 

For step 1 and 2, presto uses "metastore" related configuration entries - 

hive.metastore.authentication.type=KERBEROS
hive.metastore.service.principal=hive/hive-metastore-host.example.com@EXAMPLE.COM
hive.metastore.client.principal=presto@EXAMPLE.COM
hive.metastore.client.keytab=/etc/presto/hive.keytab

and for step 3, it uses the following configuration

hive.hdfs.authentication.type=KERBEROS
hive.hdfs.impersonation.enabled=true
hive.hdfs.presto.principal=presto@EXAMPLE.COM
hive.hdfs.presto.keytab=/etc/presto/hdfs.keytab

 

Only step 3 has impersonation available. For step 1 and 2, presto runs the query with credentials configured to connect to the metastore. In the original message, we have the below configuration.

hive.metastore.client.principal=presto@BDFPOCHP

During Step 2, presto is trying to list the external location contents as the user "presto" and it is failing as the folder permissions clearly blocks presto user from accessing it as shown in the error message.

Permission denied: user=presto, access=EXECUTE, inode="/apps/hive/warehouse/z_app_ccbihadoop_hive_temp.db/wh_visits":a_app_ccbihadoop:r_app_ccbihadoop_writer:drwxrwx---

Note that the folder is owned by a_app_ccbihadoop and group is r_app_ccbihadoop_writer. Looks like "presto" user is not part of that group. Hence the query failed. As I indicated at the beginning of the message, the main problem is with "impersonation while querying hive metadata". If that was possible, then the failure wouldn't have happened.

 

Workaround:

You can use setfacl command to give "presto" user read and execute permission to this folder.

Details can be found here:

https://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-common/FileSystemShell.html

 

Thanks,
Saj 
 

Fan

Re: Unable to impersonate the logged user through Presto

Hello Saj, Thibault,

 

we are using HDP 2.5.3

 

The error shows that Presto is trying to access a HDFS path, indicating that it already queried the Hive metastore successfully and retreived the HDFS path for the specified Hive table.

 

so, by the time Presto knows that the table data resides in a HDFS path, it means that it already successfully queried the Hive metastore.

 

We are indeed talking about HDFS impersonation; the error comes from HDFS at "step 3" and is related to a HDFS path, not to the Hive metastore.

 

In fact, your advice is to give presto read/execute access to a HDFS path using HDFS commands.

The principal shown in the error should refer to hive.hdfs.presto.principal, and not to hive.metastore.client.principal.

 

Is there any a successful case of Presto HDFS impersonation we could take as an example?

 

Thanks,

 

Best Regards,

Marco

Re: Unable to impersonate the logged user through Presto

Hi Marco,

I use HDP 2.4.3 and we have presto impersonation configured to access data in HDP. I use presto 0.157t1.2 as well as 0.189 (open source). 

 

 

Let me see if I can explain what I am trying to say earlier. I agree with your metastore statement that presto application was able to fetch data from hive metastore. The keypoint is this - in step 2 of my prev explanation, presto application is using the credentials it used during step 1 (metastore data retrieval) for getting details. In step 2, impersonation is not used.

 

Let me see if the below details will help:

 

Step 1

          a:  presto application retrieves data from hive metastore. User id used is from "hive.metastore.client.principal" and in our current example, that will be presto@BDPFOCHP

          b:  presto applicaiton validates the external path associated with the hive table. User id used for this check is again the value set in "hive.metastore.client.principal" - current example has it as presto@BDPFOCHP  (This is where the failure is occurring as no impersonation is done with this user account)

 

Step 2:

          a: presto application is ready to fetch data from the physical location. Impersonation is available and so presto application will connect to HDFS using the account configured in "hive.hdfs.presto.principal" and apply impersonation to the user who is submitting the query (uses the user id specified with --user or the logged in user).

 

My point is that failure is happening in Step 1 b. Note that there is no impersonation available for metastore related user account (this is what I mentioned at the beginning of my prev post).

 

To fix the issue, all we need to do is to allow the user id that is configured for metastore access to have read and execute permission to the hdfs folder. There are multiple ways you can achieve this.

1. Change the HDFS folder permssion to allow the user (presto@BDPFOCHP) to have read and execute access

2. Use setfacl command to grant special permission to the user. When you list the hdfs location, you will see that file permissions are still kept as it is now. But there will be a plus (+) at the end of permission to show that facl is applied to this folder. If you use ranger or other similar products, you probably can make it work through them as well.

 

Thanks,
Saj

Fan

Re: Unable to impersonate the logged user through Presto

Hi Saj,

 

now I understand! In fact, it works as you wrote and I can see Presto impersonating the end user when accessing the table.

 

What I don't understand though is why doesn't it use the same mechanism as HDFS to perform that check?

 

This is better, but still partially a security problem, as we would anyway require user "presto" to be able to read from all directories...

 

 

Thanks for the explaination!

Best

Marco

Highlighted
New Member

Re: Unable to impersonate the logged user through Presto

We are also running into this issue with Metastore impersonation.  

 

Is there a plan to add a fix which allow this validation to be done with HDFS impersonation? Or ways to disable this check?

 

thanks,

-hw

Tourist

Re: Unable to impersonate the logged user through Presto

Is there a solution to this issue.

 

We are also in the same situation trying to figure this out.  Impersonation at HDFS storage works however fails at the Hive metastore level.  I noticed in release 0.191 documentation that they have provided hive metstore thrift service authentication while using Kerberos.  Will this resolve the issue?  Can someone let me know if you are trying this out.

 

https://prestodb.io/docs/current/connector/hive-security.html

Re: Unable to impersonate the logged user through Presto

I don't see any additional details even in 0.191 release documents indicating that the problem is fixed. Kerberos based authentication for hive metastore was there all along. What is missing is the impersonation part.

 

Check the below link related to 0.191 release and the statement in there:

https://prestodb.io/docs/current/connector/hive-security.html#hive-security-impersonation

Impersonation Accessing the Hive Metastore

Presto does not currently support impersonating the end user when accessing the Hive metastore.

 

So until the above is not fixed, I think we will have this minor issue with presto. The workaround I mentioned in previous posts works for us.

 

Thanks

Saj