ISVM virtual machine continuously reboots every few minutes

search cancel

ISVM virtual machine continuously reboots every few minutes

book

Article ID: 324650

calendar_today

Updated On:

Products

VMware

Issue/Introduction

Symptoms:

One of the ISVM virtual machines is in a boot loop (continuously rebooting every few minutes).
Errors mentioning one or more commitlog failures may be present in the /opt/vmware/cassandra/apache-cassandra-2.2.4/logs/debug.log file on the same ISVM VM.

Cause

Cassandra watchdog configuration on the ISVM virtual machines may cause a continuous boot loop if the service fails at startup.

Resolution

This is a known issue affecting VMware Cloud Foundation 2.x.

This issue is resolved in VMware Cloud Foundation version 2.1.3, available at VMware Downloads.

To work around this issue if you do not want to upgrade, complete this procedure to stop the boot loop:

Log in to the ISVM VM as the root user.
Make a copy of the /opt/vmware/ism/scripts/common/ism-repair.sh file by running this command:

cp /opt/vmware/ism/scripts/common/ism-repair.sh /tmp/
Open the /opt/vmware/ism/scripts/common/ism-repair.sh file using a text editor.
Find the line that begins with MAX_CASS_RESTARTS and set the value to 9999999.
Save and close the file.

Additional Information

This issue may be related to corrupted Cassandra commit logs. Cassandra fails to restart successfully and the watchdog service on the ISVM VMs after failing to restart Cassandra for 16 times, reboots the ISVM VMs.

To work around this issue:

Stop the watchdog service on the ISVM VM by running this command:

service ism-watchdog stop
Inspect the Cassandra logs /opt/vmware/cassandra/apache-cassandra-2.2.4/logs/debug.log and identify the offending commit log file name which is corrupted.
If any corrupted commit logs are noted in Step 2, remove them from the /opt/vmware/ism/logs folder using the rm command.
Restart Cassandra by running this command:

service cassandraserver start
Repeat the process until all corrupted commit logs are deleted. There is no automation of remediating a commit log corruption failure.

Note: If the service does not start successfully, look for more commit log failures and remove the offending commit logs.

Feedback

thumb_up Yes

thumb_down No