When the Hadoop Distributed File System (HDFS) is stored on Isilon and HDFS commands are issued by nonsuperuser accounts, the following error messages are produced:
[gpadmin@HAWQMASTER~]$ hdfs dfs -ls / 16/08/01 21:48:39 WARN ipc.Client: Unexpected error reading responses on connection Thread[IPC Client (2008879874) connection to isi-sc.lab.com/10.110.110.209:8020 from gpadmin,5,main] java.lang.NullPointerException at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1125) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:967) ls: Failed on local exception: java.io.IOException: Error reading responses; Host Details : local host is: "hdw1.gphd.local/10.120.140.10"; destination host is: "isi-sc.lab.com":8020; [gpadmin@HAWQMASTER ~]$
However, the same command issued by the root will work correctly:
[root@HAWQMASTER ~]# hdfs dfs -ls / Found 10 items drwxrwxrwx - yarn hadoop 0 2016-07-01 19:34 /app-logs drwxr-xr-x - hdfs hdfs 0 2016-06-21 03:55 /apps drwxr-xr-x - yarn hadoop 0 2016-06-20 13:34 /ats drwxr-xr-x - gpadmin gpadmin 0 2016-06-20 14:02 /hawq_default drwxr-xr-x - hdfs hdfs 0 2016-06-20 13:34 /hdp drwxr-xr-x - mapred hdfs 0 2016-06-20 13:34 /mapred drwxrwxrwx - mapred hadoop 0 2016-06-20 13:34 /mr-history drwxr-xr-x - gpadmin gpadmin 0 2016-06-20 15:22 /pxf_data drwxrwxrwx - hdfs hdfs 0 2016-07-15 03:02 /tmp drwxr-xr-x - hdfs hdfs 0 2016-07-02 06:11 /user [root@HAWQMASTER~]#
When analyzing the TCP dumps from the client side, ACCESS_DENIED_MESSAGES are seen in the packet contents:
tcpdump -i any -w /root/gpadmin.trc "tcp port 8020"
tcpdump -XX -n -r /root/gpadmin.trc | less <...> 15:11:37.471439 IP 10.110.110.209.isi-sc.lab.com > åÊhdw1.gphd.local.46219: Flags [P.], seq 1:165, ack 198, win 2058, options [nop,nop,TS val 3591022876 ecr 3339133289], length 164 0x0000: 0000 0001 0006 000e 1ea6 2280 0000 8100 .........."..... 0x0010: 03eb 0800 4500 00d8 8f39 4000 4006 762b ....E....9@[email protected]+ 0x0020: 0ab2 8fcf 0ab2 8f88 1f54 b48b 8d46 4378 .........T...FCx 0x0030: 8481 dc73 8018 080a 7303 0000 0101 080a ...s....s....... 0x0040: d60a a91c c707 2169 0000 00a0 9e01 08fd ......!i........ 0x0050: ffff ff0f 1001 1809 2213 6a61 7661 2e69 ........".java.i 0x0060: 6f2e 494f 4578 6365 7074 696f 6e2a 6720 o.IOException*g. 0x0070: 7374 6174 7573 3a20 5354 4154 5553 5f41 status:.STATUS_A 0x0080: 4343 4553 535f 4445 4e49 4544 203d 2030 CCESS_DENIED.=.0 0x0090: 7843 3030 3030 3032 3220 5061 7468 3a20 xC0000022.Path:. 0x00a0: 2f6f 6e65 6673 5f68 6466 732f 6966 732f /onefs_hdfs/ifs/ 0x00b0: 6461 7461 2f43 6c75 7374 6572 4275 6363 data/ClusterBucc 0x00c0: 696e 5959 5959 5959 6f6e 652d 4443 412f xxxxxx/Zone-DCA/ 0x00d0: 6861 646f 6f70 3004 3a10 8283 0dee 1fcd hadoop0.:....... 0x00e0: 4a2f 9cd4 862d c588 c0a9 4001 J/...-....@.
Queries in HAWQ may fail with the following:
ERROR: Append-Only Storage Read could not open segment file 'hdfs://isi-sc.lab.com:8020/hawq_data/gpseg16/16385/16596/635983.1' for relation 'loaded_data' (seg16 slice1 hdw1.gphd.local:40000 pid=792583) (Detail HdfsRpcException: RPC channel to "isi-sc.lab.com:8020" got protocol mismatch: RPC channel cannot find pending call: id = -3.;Line 1574;Routine cdbdisp_finishCommand;). [nQSError: 16015] SQL statement execution failed. (HY000)
The Hadoop client receives a nonstandard reply from the Isilon HDFS which causes the NullPointerException (NPE) error seen by the HDFS client.
The nonstandard message indicates that access is denied to HDFS for the given user, as per the TCPDUMP output.
This is caused by file system permissions issues and ACL issues on the Isilon side. In Isilon, Hadoop requires at least read permissions from the root directory all the way up to the Isilon directory where HDFS files are located.
chmod +a group 507 allow dir_gen_read,dir_gen_execute .