Unable to impersonate the logged user through Presto

Presto
Teradata Employee

Unable to impersonate the logged user through Presto

Hi, 

I'm trying to configure the lastest version of Presto 0.167-t.0.2 with the configuration below, and I am faced with a problem of impersonification.

  • The Hadoop cluster is a HDP Cluster v.2.3.4.7 
  • The Cluster is kerberised, and we are using Free IPA to manage all the users.
  • The user presto was created through Free IPA as usual with a specific keytab imported to the keytab folder with the good rights

Our goal here is to replace the Hive Cli / beeline and use Presto as a primary cli to have better performance on our queries, and access to the to the tables stored in HDFS.

 

Could you please tell me if you have encounterd this issue ? 

 

connector.name=hive-hadoop2
hive.metastore.uri=thrift://inbdfda01.fqdn:9083
hive.metastore.authentication.type=KERBEROS
hive.metastore.service.principal=hive/inbdfda01.fqdn@BDFPOCHP
hive.metastore.client.principal=presto@BDFPOCHP
hive.metastore.client.keytab=/etc/security/keytabs/presto.headless.keytab
hive.hdfs.authentication.type=KERBEROS
hive.hdfs.impersonation.enabled=true
hive.hdfs.presto.principal=presto@BDFPOCHP
hive.hdfs.presto.keytab=/etc/security/keytabs/presto.headless.keytab
hive.config.resources=/etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml

Error message : 

[root@inbdfda01]# /images/presto --catalog hive --schema z_app_ccbihadoop_hive_temp --user u_xyz1234_adm                                                 presto:z_app_ccbihadoop_hive_temp>
 
presto:z_app_ccbihadoop_hive_temp> select * from wh_visits limit 10;
Query 20170413_122724_00007_r3a9p failed: org.apache.hadoop.security.AccessControlException: Permission denied: user=presto, access=EXECUTE, inode="/apps/hive/warehouse/z_app_ccbihadoop_hive_temp.db/wh_visits":a_app_ccbihadoop:r_app_ccbihadoop_writer:drwxrwx---
        at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319)
        at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:259)
        at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:205)
        at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1771)
        at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getFileInfo(FSDirStatAndListingOp.java:108)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3866)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1076)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:843)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145)

Why the user trying to listing HDFS is presto and not root here ? 

In addition, we have added the proxyuser through Ambari to allow the user to make the queries. 

hadoop.proxyuser.presto.hosts=*
hadoop.proxyuser.presto.groups=*

In advance, thank you. 

 

Best regards, 

Stephen

 

7 REPLIES
Teradata Employee

Re: Unable to impersonate the logged user through Presto

Hi Stefun,

 

Why the user trying to listing HDFS is presto and not root here ?
--The user trying to access HDFS will be the user specified in `hive.hdfs.presto.principal`. Hence if you want root user, update the principal accordingly.

Incase, you want to use the same principal(presto), another option could be to give presto neccessary permissions to /apps/hive/warehouse/z_app_ccbihadoop_hive_temp.db/wh_visits

 

--

Sanjay

Teradata Employee

Re: Unable to impersonate the logged user through Presto

Hello Sanjay,

 

Thanks for your return. I'm working with Stephen on this problem.

Have you noted that we have specified the following property to true?

hive.hdfs.impersonation.enabled=true

We understood that Presto should connect HDFS with the user specified in "hive.hdfs.presto.principal" (presto) and perform an impersonation on the user who executes the presto CLI (root). It is why we have configured the following proxy user on HDFS side (presto):

hadoop.proxyuser.presto.hosts=*
hadoop.proxyuser.presto.groups=*

 Have we missed something?

 

Regards,

 

Thibault

 

Teradata Employee

Re: Unable to impersonate the logged user through Presto

We understood that Presto should connect HDFS with the user specified in "hive.hdfs.presto.principal" (presto) and perform an impersonation in the following sequence

1. --user passed. So in this case `u_xyz1234_adm`

2. If --user is not specified then as user who executes the presto CLI (root).

 

In this case, since you're passing --user, presto will try to impersonate as `u_xyz1234_adm` and not `root`. If you don't use `--user`, presto will impersonate as `root`.

So, if you give `u_xyz1234_adm` permission to `/apps/hive/warehouse/z_app_ccbihadoop_hive_temp.db/wh_visits` , it should work.

 

Else, try passing --user as `a_app_ccbihadoop`, which has access to that file

Teradata Employee

Re: Unable to impersonate the logged user through Presto

Sanjay - see that error message states that presto user is executing the HDFS operation.

This is weird.

 

Looking at the config it looks you are doing everything right.

Can you please try to create a new table with impersonation enabled. Perform an INSERT into the table. And check

in HDFS who is the owner of created file?

 

Regards, Łukasz

Tags (1)
Teradata Employee

Re: Unable to impersonate the logged user through Presto

Hi Thibault/Stefun,

I had a discussion with Losipiuk. The error message might be incorrect.
Try the following as mentioned before and let us know your results.

a). Create a new table with impersonation enabled. Perform an INSERT into the table. And check
in HDFS who is the owner of created file?

b). Give `u_xyz1234_adm` permission to `/apps/hive/warehouse/z_app_ccbihadoop_hive_temp.db/wh_visits`
c). Try passing --user as `a_app_ccbihadoop`, which has access to that file.

Regards,
Sanjay

Teradata Employee

Re: Unable to impersonate the logged user through Presto

Hi Stefun,

 

Were you able to resolve this.

 

Regards,

Sanjay

Teradata Employee

Re: Unable to impersonate the logged user through Presto

Hi @sanjay,

No we didn't resolve our problem. Yesterday with @Thibault, we have reinstalled the version 0.157-t supposed to work withe the impersonation enabled.

In our test we have one coordinator, and one worker deployed with prestoadmin, installed on differents nodes. 

Our nodes are correctly synced. 

 

# presto-cli --server $coor_dns:8080
presto> use system.runtime;
presto:runtime> show tables;
    Table
--------------
 nodes
 queries
 tasks
 transactions
(4 rows)

Query 20170505_084730_00002_6bify, FINISHED, 2 nodes
Splits: 2 total, 2 done (100.00%)
0:00 [4 rows, 97B] [17 rows/s, 413B/s]

presto:runtime> select * from nodes;
               node_id                |        http_uri         | node_version  | coordinator | state
--------------------------------------+-------------------------+---------------+-------------+--------
 a06882f8-19b9-49f1-b056-7ddccf64f5dc | http://$coor_ip:8080    | 0.157.1-t.0.2 | true        | active
 8ff20b41-0e01-42e7-8eb0-1567d8d32763 | http://$work_ip:8080    | 0.157.1-t.0.2 | false       | active
(2 rows)

Query 20170505_084736_00003_6bify, FINISHED, 2 nodes
Splits: 2 total, 2 done (100.00%)
0:00 [2 rows, 158B] [80 rows/s, 6.22KB/s]

Our tests below :

# klist
klist: No credentials cache found (ticket cache FILE:/tmp/krb5cc_0)

without the parameter '--user' at the lauching of the presto-cli 
# presto-cli --server $coo_dns:8080 --catalog hive

presto:presto_test> create table presto_test_scl (c1 varchar(10));
Query 20170505_091129_00028_6bify failed: java.security.AccessControlException: Permission denied: user=presto, access=READ, inode="/apps/hive/warehouse/presto_test.db":u_xyz1234_adm:u_xyz1234_adm:drwx------
        at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319)
        at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:219)
        at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1955)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1939)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPathAccess(FSDirectory.java:1913)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAccess(FSNamesystem.java:8749)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.checkAccess(NameNodeRpcServer.java:2087)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.checkAccess(ClientNamenodeProtocolServerSideTranslatorPB.java:1454)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:421)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2345)
presto:presto_test> exit;
# The same test with the'--user' specified at the launch of the presto-cli:
# presto-cli --server $coo_dns:8080 --catalog hive --user u_xyz1234_adm presto> use presto_test; presto:presto_test> create table presto_test_scl (c1 varchar(10)); Query 20170505_091203_00031_6bify failed: java.security.AccessControlException: Permission denied: user=presto, access=READ, inode="/apps/hive/warehouse/presto_test.db":u_xyz1234_adm:u_xyz1234_adm:drwx------ at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:219) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1955) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1939) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPathAccess(FSDirectory.java:1913) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAccess(FSNamesystem.java:8749) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.checkAccess(NameNodeRpcServer.java:2087) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.checkAccess(ClientNamenodeProtocolServerSideTranslatorPB.java:1454) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:421) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2345)

Again we have the same error, so we have decided to give the presto user the good rights on HDFS to perform the create table 

# hadoop fs -chmod 707 /apps/hive/warehouse/presto_test.db
# hdfs dfs -ls -d /apps/hive/warehouse/presto_test.db
drwx---rwx   - u_xyz1234_adm u_xyz1234_adm          0 2017-05-05 11:41 /apps/hive/warehouse/presto_test.db
# presto-cli --server $coo_dns:8080 --catalog hive --schema presto_test
presto:presto_test> create table presto_test_noauth (c1 varchar(10));
CREATE TABLE
presto:presto_test> insert into presto_test_noauth values ('1');
INSERT: 1 row

Query 20170505_094501_00060_6bify, FINISHED, 2 nodes
Splits: 3 total, 3 done (100.00%)
0:00 [0 rows, 0B] [0 rows/s, 0B/s]

presto:presto_test> select * from presto_test_noauth;
 c1
----
 1
(1 row)

Query 20170505_094515_00062_6bify, FINISHED, 1 node
Splits: 2 total, 2 done (100.00%)
0:00 [1 rows, 154B] [10 rows/s, 1.63KB/s]

On the HDFS side we can see that the owner of the table is root 

# hdfs dfs -ls -d /apps/hive/warehouse/presto_test.db/presto_test_noauth
drwxrwxrwx   - root u_xyz1234_adm          0 2017-05-05 11:45 /apps/hive/warehouse/presto_test.db/presto_test_noauth

 

If we specify the parameter '--user' the user owner will be the specified user and not the unix prompt user 

 

# presto-cli --server $coo_dns:8080 --catalog hive --schema presto_test --user u_xyz1234_adm
presto:presto_test> create table presto_test_auth (c1 varchar(10));
CREATE TABLE
presto:presto_test> insert into presto_test_auth values ('1');
INSERT: 1 row

Query 20170505_095024_00066_6bify, FINISHED, 2 nodes
Splits: 3 total, 3 done (100.00%)
0:00 [0 rows, 0B] [0 rows/s, 0B/s]

presto:presto_test> select * from presto_test_auth;
 c1
----
 1
(1 row)

Query 20170505_095030_00067_6bify, FINISHED, 1 node
Splits: 2 total, 2 done (100.00%)
0:00 [1 rows, 154B] [11 rows/s, 1.73KB/s]

# hdfs dfs -ls -d /apps/hive/warehouse/presto_test.db/presto_test_auth
drwxrwxrwx   - u_xyz1234_adm u_xyz1234_adm          0 2017-05-05 11:50 /apps/hive/warehouse/presto_test.db/presto_test_auth

So even if the impersonation is enable we can't have access to the tables stored in HDFS.

The only one user who is able to list HDFS is presto. On a production environment we can't delegate the POSIX rights/persmissions to the presto user.
The interesting thing is that the user trying to access to his data on Hive/HDFS won't have to specify the '--user' because in fact presto use the default prompted user logged in the shell.
As you can see with our tests, the impersonation still doesn't work, and presto is the user mentioned each time to list HDFS.

In advance, thank you if you have more ideas/ tips.

Best regards,
Stef