Services are showing as "stopped" and fail to start - "java.net.BindException: Address already in use"
search cancel

Services are showing as "stopped" and fail to start - "java.net.BindException: Address already in use"

book

Article ID: 332884

calendar_today

Updated On:

Products

VMware Smart Assurance

Issue/Introduction

Symptoms:


  • Some services are listed as "stopped" on the Centralized Management UI or when checking the service status on the Module Manager: 
XXXXX:/opt/APG/bin # /opt/APG/bin/manage-modules.sh service status all
 * Checking 'topology-mapping-service Default'...                    [ running ]
 * Checking 'topology-service Default'...                            [ running ]
 * Checking 'webservice-gateway Default'...                          [ stopped ]
 * Checking 'mysql Default'...                                       [ running ]
 * Checking 'alerting-backend Default'...                            [ running ]
 * Checking 'backend Default'...                                     [ running ]
 * Checking 'collector-manager Load-Balancer'...                     [ running ]
 * Checking 'collector-manager emc-watch4net-health'...              [ running ]
 * Checking 'event-processing-manager Alert-Consolidation'...        [ running ]
 * Checking 'event-processing-manager Maintenance-Manager'...        [ running ]
 * Checking 'event-processing-manager cisco-ucs'...                  [ running ]
 * Checking 'event-processing-manager emc-vnx'...                    [ running ]
 * Checking 'event-processing-manager vmware-vcenter'...             [ running ]
 * Checking 'script-engine Default'...                               [ running ]
 * Checking 'task-scheduler Default'...                              [ running ]
 * Checking 'compliance-backend generic-compliance'...               [ running ]
  • Attempting to start the service succeeds, but the service stops immediately afterwards: 
XXXXX:/opt/APG/bin # /opt/APG/bin/manage-modules.sh service start webservice-gateway Default
 * Starting 'webservice-gateway Default'...                               [ OK ]
srmbe01:/opt/APG/bin # /opt/APG/bin/manage-modules.sh service status webservice-gateway Default
 * Checking 'webservice-gateway Default'...                          [ stopped ]
  • Under the log directory for the service, there are numerous concurrent instances for the log files (<LOG_FILE>-0-#.log), for example:
XXXXX:/opt/APG/Tools/Webservice-Gateway/Default/logs # ls -l *-0-*.log
-rw-r--r-- 1 apg apg 615133 Dec 21 14:57 gateway-0-0.log
-rw-r--r-- 1 apg apg   3258 Dec 21 14:50 gateway-0-1.log
  • The latest instance of the log file (e.g. gateway-0-1.log from above) has the following error or similar:
SEVERE     -- [2016-12-12 16:18:11 NZDT] -- HttpServer::start(): an error occured starting the server
java.net.BindException: Address already in use
  • There is no PID file located in the log directory (e.g. apg-webservice-gateway-default.pid).



Environment

Watch4Net/M&R-7.x

Cause

The PID file in the log directory (e.g. apg-webservice-gateway-default.pid)  is used by the Module Manager to monitor the process. If the file does not exist, the Module Manager will report the service as being "stopped". This can occur if the service did not stop properly or was hung and hence did not stop when requested.

Resolution

To workaround this issue, the process should be terminated and restarted via the Module Manager:
  1. Stop all services on the host using the Module Manager:
XXXXX:/opt/APG/bin # manage-modules.sh service stop all
 * Stopping 'topology-service Default'...                                 [ OK ]
 * Stopping 'topology-mapping-service Default'...                         [ OK ]
 * Stopping 'task-scheduler Default'...                                   [ OK ]
 * Stopping 'script-engine Default'...                                    [ OK ]
 * Stopping 'event-processing-manager vmware-vcenter'...                  [ OK ]
 * Stopping 'event-processing-manager emc-vnx'...                         [ OK ]
 * Stopping 'event-processing-manager cisco-ucs'...                       [ OK ]
 * Stopping 'event-processing-manager Maintenance-Manager'...             [ OK ]
 * Stopping 'event-processing-manager Alert-Consolidation'...             [ OK ]
 * Stopping 'collector-manager emc-watch4net-health'...                   [ OK ]
 * Stopping 'collector-manager Load-Balancer'...                          [ OK ]
 * Stopping 'compliance-backend generic-compliance'...                    [ OK ]
 * Stopping 'backend Default'...                                          [ OK ]
 * Stopping 'alerting-backend Default'...                                 [ OK ]
 * Stopping 'mysql Default'...                                            [ OK ]
 * Stopping 'webservice-gateway Default'...                      [ not-running ]
  1. Run the "ps -ef | grep -i apg" command to search for any processes that have not stopped and record the PID, as highlighted below:
XXXXX:/opt/APG/bin # ps -ef | grep -i apg
root     25486 21564  0 16:15 pts/0    00:00:00 grep -i apg
apg      28997     1  2 11:35 ?        00:08:02 /opt/APG/Java/Sun-JRE/8.0.102/bin/java ...
  1. Kill the service and attempt to restart all the services:
XXXXX:/opt/APG/bin # kill 28997

XXXXX
:/opt/APG/bin # ./manage-modules.sh service start all * Starting 'topology-mapping-service Default'... [ OK ] * Starting 'topology-service Default'... [ OK ] * Starting 'webservice-gateway Default'... [ OK ] * Starting 'mysql Default'... [ OK ] * Starting 'alerting-backend Default'... [ OK ] * Starting 'backend Default'... [ OK ] * Starting 'collector-manager Load-Balancer'... [ OK ] * Starting 'collector-manager emc-watch4net-health'... [ OK ] * Starting 'event-processing-manager Alert-Consolidation'... [ OK ] * Starting 'event-processing-manager Maintenance-Manager'... [ OK ] * Starting 'event-processing-manager cisco-ucs'... [ OK ] * Starting 'event-processing-manager emc-vnx'... [ OK ] * Starting 'event-processing-manager vmware-vcenter'... [ OK ] * Starting 'script-engine Default'... [ OK ] * Starting 'task-scheduler Default'... [ OK ] * Starting 'compliance-backend generic-compliance'... [ OK ]
  1. Verify that the services stay running and confirm in the most recent log file that the service is started.