NodeManager fails to start in a secured cluster
search cancel

NodeManager fails to start in a secured cluster

book

Article ID: 294709

calendar_today

Updated On:

Products

Services Suite

Issue/Introduction

Symptoms:

The NodeManager logs indicate a failure similar to the one below: 

2014-02-26 15:31:55,178 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: setsid exited with exit code 0
2014-02-26 15:31:55,182 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code from container is : 24
2014-02-26 15:31:55,183 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: configuration tokenization failed
2014-02-26 15:31:55,183 FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManager
org.apache.hadoop.yarn.YarnException: Failed to initialize container executor
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.init(NodeManager.java:144)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:321)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:359)
Caused by: java.io.IOException: Linux container executor not configured properly (error=24)
        at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:135)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.init(NodeManager.java:142)
        ... 2 more
Caused by: org.apache.hadoop.util.Shell$ExitCodeException: Can't get configured value for yarn.nodemanager.linux-container-executor.group.

        at org.apache.hadoop.util.Shell.runCommand(Shell.java:202)
        at org.apache.hadoop.util.Shell.run(Shell.java:129)
        at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:322)
        at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:130)
        ... 3 more

2014/02/28 02:37:22 INFO mapreduce.Job: Job job_1393582635312_0006 failed with state FAILED due to: Application application_1393582635312_0006 failed 1 times due to AM Container for appattempt_1393582635312_0006_000001 exited with exitCode: -1000 due to: java.io.IOException: App initialization failed (139) with output:
 at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:191)
 at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:860)
Caused by: org.apache.hadoop.util.Shell$ExitCodeException:
 at org.apache.hadoop.util.Shell.runCommand(Shell.java:202)
 at org.apache.hadoop.util.Shell.run(Shell.java:129)
 at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:322)
 at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:183)
 ... 1 more

.Failing this attempt.. Failing the application.

Testing the container-executor.cfg file will do nothing or it it will return the following:

# cd /usr/lib/gphd/hadoop-yarn/bin
# ./container-executor --checksetup
configuration tokenization failed
Can't get configured value for yarn.nodemanager.linux-container-executor.group.

Environment


Cause

After configuring a secure cluster and the NodeManager fails to start, check the for the symptoms discussed above in the nodes /var/log/gphd/hadoop-yarn/yarn-yarn-nodemanager-*.log

This error can occur if container-executor.cfg does not have a banned.users entry or an empty "banned.user=" entry.

# cd /etc/gphd/hadoop/conf
# cat container-executor.cfg
#configured value of yarn.nodemanager.linux-container-executor.group
yarn.nodemanager.linux-container-executor.group=yarn
#comma separated list of users who can not run applications
#Prevent other super-users
min.user.id=400
# cd /etc/gphd/hadoop/conf
# cat container-executor.cfg
#configured value of yarn.nodemanager.linux-container-executor.group
yarn.nodemanager.linux-container-executor.group=yarn
#comma separated list of users who can not run applications
banned.users=
#Prevent other super-users
min.user.id=400

Resolution

To resolve this issue, review your security policy and add in the list of accounts that may not run jobs in YARN.