Some Services on the SDDC Manager Controller Virtual Machine do not start automatically after a reboot
book
Article ID: 316893
calendar_today
Updated On:
Products
VMware Cloud Foundation
Issue/Introduction
Symptoms:
VMware Cloud Foundation was recently updated to 2.3.1 or 2.3.2.
The SDDC Manager Controller virtual machine was rebooted after being upgraded.
The SDDC Manager web interface might not be reachable.
DNS resolution for hosts and virtual machines in the VMware Cloud Foundation environment might fail.
The Lifecycle Management section of the SDDC Manager web interface might be blank or display an error.
Running the command systemctl status scs.service in an SSH or console session on the SDDC Manager Controller VM shows that the System Controller Service is not started or enabled:
* scs.service - VCF System Controller Service Loaded: loaded (/etc/systemd/system/scs.service; disabled; vendor preset: enabled) Active: inactive (dead)
Environment
VMware Cloud Foundation 2.3.x
Cause
The System Controller Service is responsible for watching other critical services and restarting them if they fail. If the service is not enabled, it will not start automatically on the next reboot of the SDDC Manager Controller VM. Without this service, the DNS service will not be started and the Lifecycle Management service may not be restarted if it is unable to start during the boot process.
Resolution
This is a known issue affecting VMware Cloud Foundation 2.3.x. There is currently no resolution.
Workaround: Complete the following steps to work around the issue:
Log into the SDDC Manager Controller virtual machine as root using SSH or a console session.
Run the following command:
systemctl enable scs.service
If the service is not running, reboot the SDDC Manager Controller VM.
(Optional) Confirm that the System Controller service is enabled and running:
systemctl status scs.service
Note: You will see output similar to the following:
* scs.service - VCF System Controller Service Loaded: loaded (/etc/systemd/system/scs.service; enabled; vendor preset: enabled) Active: active (running) since Thu 2018-05-10 21:51:13 UTC; 20h ago Main PID: 630 (java) CGroup: /system.slice/scs.service |- 630 /usr/java/jre-vmware/bin/java -Ddaemon.pidfile=/opt/vmware/scs/scsd.pid -cp /opt/vmware/scs/jars/scsd-2.6.1-RELEASE-jar-with-dependencies... |-13649 /usr/bin/python /opt/vmware/scs/scripts/SCSHelper -n cassandra -o diag -l /opt/vmware/scs/logs/scsdiag |-13650 /bin/bash /opt/vmware/scs/scripts/cassandra-diag.sh |-13655 /bin/sh /opt/vmware/cassandra/apache-cassandra-2.2.4/bin/nodetool status `-13705 /usr/java/jre-vmware/bin/java -javaagent:/opt/vmware/cassandra/apache-cassandra-2.2.4/bin/../lib/jamm-0.3.0.jar -cp /opt/vmware/cassandra...
May 11 17:51:07 sddc-manager-controller java[630]: success=True May 11 17:51:07 sddc-manager-controller java[630]: op=status May 11 17:51:07 sddc-manager-controller sudo[13629]: root : TTY=unknown ; PWD=/opt/vmware/scs/logs ; USER=root ; COMMAND=/opt/vmware/scs/scripts...s -n pug May 11 17:51:07 sddc-manager-controller sudo[13629]: pam_unix(sudo:session): session opened for user root by (uid=0) May 11 17:51:07 sddc-manager-controller java[630]: status=0 May 11 17:51:07 sddc-manager-controller java[630]: running=True May 11 17:51:07 sddc-manager-controller java[630]: returncode=0 May 11 17:51:07 sddc-manager-controller java[630]: name=lcm May 11 17:51:07 sddc-manager-controller java[630]: success=True May 11 17:51:07 sddc-manager-controller java[630]: op=status Hint: Some lines were ellipsized, use -l to show in full.
(Optional) Confirm that the Lifecycle Management service is enabled and running:
systemctl status lcm.service
Note: You will see output similar to the following:
* lcm.service - LCM app Loaded: loaded (/etc/systemd/system/lcm.service; enabled; vendor preset: enabled) Active: active (running) since Thu 2018-05-10 21:51:45 UTC; 19h ago Main PID: 2415 (java) CGroup: /system.slice/lcm.service `-2415 /usr/java/jre-vmware/bin/java -Xmx3072m -XX:MaxPermSize=512m -Dspring.profiles.active=evo -Djava.io.tmpdir=/home/vrack/lcm/tmp -classpath ...
May 10 21:51:39 sddc-manager-controller systemd[1]: Starting LCM app... May 10 21:51:39 sddc-manager-controller systemd[1]: lcm.service: PID file /home/vrack/lcm/logs/lcm.pid not readable (yet?) after start: No such file...irectory May 10 21:51:45 sddc-manager-controller systemd[1]: lcm.service: Supervising process 2415 which is not our child. We'll most likely not notice when it exits. May 10 21:51:45 sddc-manager-controller systemd[1]: Started LCM app. Hint: Some lines were ellipsized, use -l to show in full.
(Optional) Confirm that the DNS service is enabled and running:
systemctl status unbound.service
Note: You will see output similar to the following:
* unbound.service - Unbound recursive Domain Name Server Loaded: loaded (/etc/systemd/system/unbound.service; enabled; vendor preset: enabled) Active: active (running) since Thu 2018-05-10 21:51:13 UTC; 20h ago Main PID: 732 (unbound) CGroup: /system.slice/unbound.service `-732 /usr/sbin/unbound -d
May 11 17:51:52 sddc-manager-controller unbound[732]: [732:0] info: receive_udp on interface: 2 172.30.0.14 172.30.0.14 May 11 17:51:52 sddc-manager-controller unbound[732]: [732:0] info: send_udp over interface: 2 172.30.0.14 172.30.0.14 May 11 17:51:52 sddc-manager-controller unbound[732]: [732:0] info: receive_udp on interface: 2 172.30.0.14 172.30.0.14 May 11 17:51:52 sddc-manager-controller unbound[732]: [732:0] info: send_udp over interface: 2 172.30.0.14 172.30.0.14 May 11 17:51:53 sddc-manager-controller unbound[732]: [732:0] info: receive_udp on interface: 2 172.30.0.14 172.30.0.14 May 11 17:51:53 sddc-manager-controller unbound[732]: [732:0] info: send_udp over interface: 2 172.30.0.14 172.30.0.14 May 11 17:51:55 sddc-manager-controller unbound[732]: [732:0] info: receive_udp on interface: 2 172.30.0.14 172.30.0.14 May 11 17:51:55 sddc-manager-controller unbound[732]: [732:0] info: send_udp over interface: 2 172.30.0.14 172.30.0.14 May 11 17:51:57 sddc-manager-controller unbound[732]: [732:0] info: receive_udp on interface: 2 172.30.0.14 172.30.0.14 May 11 17:51:57 sddc-manager-controller unbound[732]: [732:0] info: send_udp over interface: 2 172.30.0.14 172.30.0.14
Additional Information
To determine if your environment will encounter this issue, review the output of the command systemctl status scs.service. If the service is listed as disabled in the Loaded line item, the service will not be started automatically on the next boot. The following example demonstrates a running System Controller Service that is currently disabled and will not start on the next boot of the SDDC controller VM:
* scs.service - VCF System Controller Service Loaded: loaded (/etc/systemd/system/scs.service; disabled; vendor preset: enabled) Active: active (running) since Wed 2018-05-09 19:24:44 UTC; 1 day 1h ago Main PID: 6407 (java) Tasks: 27 CGroup: /system.slice/scs.service `-6407 /usr/java/jre-vmware/bin/java -Ddaemon.pidfile=/opt/vmware/scs/scsd.pid -cp /opt/vmware/scs/jars/scsd-2.6...
May 10 21:22:47 sddc-manager-controller java[6407]: success=True May 10 21:22:47 sddc-manager-controller java[6407]: op=status May 10 21:22:47 sddc-manager-controller sudo[5893]: root : TTY=unknown ; PWD=/opt/vmware/scs/logs ; USER=root ; CO...n pug May 10 21:22:47 sddc-manager-controller sudo[5893]: pam_unix(sudo:session): session opened for user root by (uid=0) May 10 21:22:47 sddc-manager-controller java[6407]: status=0 May 10 21:22:47 sddc-manager-controller java[6407]: running=True May 10 21:22:47 sddc-manager-controller java[6407]: returncode=0 May 10 21:22:47 sddc-manager-controller java[6407]: name=lcm May 10 21:22:47 sddc-manager-controller java[6407]: success=True May 10 21:22:47 sddc-manager-controller java[6407]: op=status Hint: Some lines were ellipsized, use -l to show in full.