Ambari Agent doesn't start and no error reported in the logs
search cancel

Ambari Agent doesn't start and no error reported in the logs

book

Article ID: 294906

calendar_today

Updated On:

Products

Services Suite

Issue/Introduction

Symptoms:

When attempting to start the Ambari agent it does not start. It does not return an error message but the subsequent ambari-agent status returns the error meesage "Agent not running".


After running ambari-agent start once in the /var/log/ambari-agent/ambari-agent.log file, observe that there are only a few lines for each attempted start:

INFO 2017-03-24 10:37:53,103 main.py:90 - loglevel=logging.INFO
INFO 2017-03-24 10:37:53,103 main.py:90 - loglevel=logging.INFO
INFO 2017-03-24 10:37:53,103 main.py:90 - loglevel=logging.INFO
INFO 2017-03-24 10:37:53,104 DataCleaner.py:39 - Data cleanup thread started
INFO 2017-03-24 10:37:53,106 DataCleaner.py:120 - Data cleanup started
INFO 2017-03-24 10:37:53,108 DataCleaner.py:122 - Data cleanup finished
The logs don't show an error even though it is not a complete startup sequence.

Confirm if there are Network File System (NFS) mount points or network storage attached to this machine.

Search for fuser processes, in uninterruptible sleep state, using the command ps -flye | grep fuser. Confirm that the output looks similar to the output below (multiple fuser processes in 'D' state):
D root 513702 513701 0 80 0 2300 25605 rpc_wa 13:09 pts/14 00:00:00 fuser 8670 tcp
S root 521174 1 0 80 0 1264 2825 wait 13:47 pts/14 00:00:00 /bin/sh -c fuser 8670/tcp 2>/dev/null | awk '{print $2}'
D root 521175 521174 0 80 0 2296 1396 rpc_wa 13:47 pts/14 00:00:00 fuser 8670 tcp
S root 521929 521921 0 80 0 1264 2825 wait 13:56 pts/14 00:00:00 /bin/sh -c fuser 8670/tcp 2>/dev/null | awk '{print $2}'
D root 521930 521929 0 80 0 2292 1396 rpc_wa 13:56 pts/14 00:00:00 fuser 8670 tcp
S gpadmin 523226 523026 0 80 0 904 25812 pipe_w 14:11 pts/15 00:00:00 grep fuser

This is an issue in the OS, related to the NFS. In this scenario, the host affected has some issues related to NFS.

Environment


Cause

The Ambari Agent startup process relies on the fuser command to obtain the PID of the Ambari Agent. Since this command is stuck in an infinite loop due to a bug, the startup process for the  Ambari Agent never gets completed.

Refer to this Red Hat article for more information pertaining to this issue.

Resolution

To resolve this issue, reboot the affected server. The only way to clear these processes and recover is to reboot the affected server.

During the reboot process, there will be errors pertaining to unmounting the NFS or network attached storage. A hard reboot may be required to reboot this host.