LS remain in "IN progress" state and K8S PODs take long time to come in Running state
search cancel

LS remain in "IN progress" state and K8S PODs take long time to come in Running state

book

Article ID: 306217

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Symptoms:
  • In the Churn environment, PODs start taking a long time to come into running state.
  • NSX T Version is 3.0.X
  • Many LS remain in "In progress" state for a long time.
  • Following error messages will be seen in /var/log/nsxapi* log
2020-09-10T03:51:51.268Z WARN RealizationServiceMaintenanceExecutor-0 VersionLockedObject - SyncObjectUnsafe[CorfuTable[c3e0]@26373221+-1] to 26370094 failed org.corfudb.runtime.exceptions.NoRollbackException: Can't roll back due to put@26373221 but need 26370094 so can't undo


Environment

VMware NSX-T Data Center
VMware NSX-T Data Center 3.x

Resolution

Fix:

This is fixed in NSXT 3.1.0 and higher

Workaround:

  1. Login to all 3 UA nodes go to following dir:
cd /opt/vmware/proton-tomcat/webapps/nsxapi/WEB-INF/lib
  1. Take backup of file nsx-realization-1.0.jar.
  2. Copy nsx-realization-1.0.jar to another Linux machine.
  3. On Linux machine where "jar" utility is installed, extract nsxrealization-config.properties file from nsx-realization-1.0.jar.
jar xf nsx-realization-1.0.jar META-INF/spring/nsxrealization-config.properties
  1. Edit the files to update properties:
vi META-INF/spring/nsxrealization-config.properties
  1. Change value of property realization.realizationstate.maintenance.apiBatchSize to 100 and save.
  2. Update ​the jar files with the modified file:
jar uf nsx-realization-1.0.jar META-INF/spring/nsxrealization-config.properties
  1. Copy modified JAR file in all 3 UA nodes in /opt/vmware/proton-tomcat/webapps/nsxapi/WEB-INF/lib
scp nsx-realization-1.0.jar root@<UA-IP>:/opt/vmware/proton-tomcat/webapps/nsxapi/WEB-INF/lib/
  1. On all 3 UA nodes check if file is copied properly in dir:
cd /opt/vmware/proton-tomcat/webapps/nsxapi/WEB-INF/lib/
ls -la | grep nsx-realization-1.0.jar
  1. Restart proton service in all UA nodes one by one:
systemctl restart proton
  1. Wait for the cluster to become stable, check cluster status using nsxcli:
get cluster status