A hive job which needs to process 40TB of data set failed with the following error:
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:544) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:157) ... 8 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /tmp/hive-svcckp/hive_2014-08-12_21-58-23_567_8729876893578629726-1/_task_tmp.-ext-10002/country_code=US/base_div_nbr=1/retail_channel_code=1/visit_date=2011-12-22/load_timestamp=20140810001530/_tmp.002881_0: File does not exist. Holder DFSClient_attempt_1407874976831_0233_m_000178_0_-1560055874_1 does not have any open files. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2932) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2738) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2646) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555) .....Also, the customer reported that the same type job worked well before.
lsof |wc -l
Add the following line in /etc/security/limit.conf gpadmin nofile 3000000 Then login as gpadmin and run ulimit -n 3000000 To make the permanent change, add the following line in /etc/security/limit.d/gpadmin. Create this file if not exist. gpadmin - nofile 3000000 To confirm the change, run following command as gpadmin ulimit -a To confirm the change for nologin users, run the following command ( user yarn user as an example) sudo -u yarn sh -c "ulimit -a && exec su -u yarn"