# grep -i "completed checkpoint for ########-####-####-####-##########e9" /var/log/corfu/corfu-compactor-audit.log
2023-06-01T20:34:21.337Z INFO main CheckpointWriter - appendCheckpoint: completed checkpoint for ########-####-####-####-##########e9, entries(1213644), cpSize(1180152103) bytes at snapshot Token(epoch=1192, sequence=5313828428) in 333111 ms
2023-06-01T20:54:05.308Z INFO main CheckpointWriter - appendCheckpoint: completed checkpoint for ########-####-####-####-##########e9, entries(1213681), cpSize(1180184464) bytes at snapshot Token(epoch=1192, sequence=5313956408) in 323643 ms
2023-06-01T21:09:31.314Z INFO main CheckpointWriter - appendCheckpoint: completed checkpoint for ########-####-####-####-##########e9, entries(1213681), cpSize(1180184464) bytes at snapshot Token(epoch=1192, sequence=5314047599) in 338404 ms
# grep stringId gprr.txt | awk '{print $2}' | cut -d "/" -f 1-7 | sort | uniq -c | sort -nr | head
642733 "/infra/realized-state/enforcement-points/default/security/port-security-profile-binding-maps
322725 "/infra/realized-state/enforcement-points/default/discovery/mac-discovery-profiles
7634 "/infra/realized-state/enforcement-points/default/services/nsservices
1790 "/infra/realized-state/enforcement-points/default/groups/nsgroups
1043 "/infra/realized-state/enforcement-points/default/firewalls/firewall-sections
423 "/infra/realized-state/enforcement-points/default/dhcp-servers/dhcp-server-<UUID>
244 "/infra/realized-state/enforcement-points/default/dhcp-servers/dhcp-server-<UUID>
73 "/infra/realized-state/enforcement-points/default/ops/ipfix-dfw-profiles
49 "/infra/realized-state/enforcement-points/default/dhcp-servers/dhcp-server-<UUID>
44 "/infra/realized-state/enforcement-points/default/dhcp-servers/dhcp-server-<UUID>
/config usage will consistently grow if the GPRR table becomes too large and Corfu compaction is failing. Beyond 10% alarms are thrown in the NSX UI and the UI can become inaccessible:# df -hFilesystem Size Used Avail Use% Mounted onudev 24G 0 24G 0% /devtmpfs 4.8G 7.4M 4.8G 1% /run/dev/sda2 11G 7.1G 2.7G 74% /tmpfs 24G 616K 24G 1% /dev/shmtmpfs 5.0M 0 5.0M 0% /run/locktmpfs 24G 0 24G 0% /sys/fs/cgroup/dev/sda1 930M 8.3M 857M 1% /boot/dev/mapper/nsx-repository 31G 7.0G 22G 25% /repository/dev/mapper/nsx-var+dump 9.2G 296M 8.4G 4% /var/dump/dev/mapper/nsx-tmp 3.7G 9.9M 3.5G 1% /tmp/dev/mapper/nsx-config 29G 13G 15G 46% /config/dev/mapper/nsx-image 42G 6.0G 34G 16% /image/dev/mapper/nsx-secondary 98G 3.8G 90G 5% /nonconfig/dev/mapper/nsx-var+log 27G 15G 11G 59% /var/logtmpfs 4.8G 0 4.8G 0% /run/user/1007tmpfs 4.8G 0 4.8G 0% /run/user/0
/var/log/corfu/corfu-compactor-audit.log shows Corfu database compaction failing with OutOfMemoryError:2023-04-07T16:12:43.913Z INFO metrics-logger-reporter-1-thread-1 metricsdata - type=TIMER, name=com.vmware.nsx.platform.clustering.persistence.corfu.CorfuDbDataStoreUfo.create, count=1, min=1448.6027609999999, max=1448.6027609999999, mean=1448.6027609999999, stddev=0.0, median=1448.6027609999999, p75=1448.6027609999999, p95=1448.6027609999999, p98=1448.6027609999999, p99=1448.6027609999999, p999=1448.6027609999999, mean_rate=0.0017346739786195718, m1=1.4970365977540202E-5, m5=0.029913723844527035, m15=0.10616389011240257, rate_unit=events/second, duration_unit=millisecondsjava.lang.OutOfMemoryError: Java heap spaceDumping heap to /image/core/compactor_oom.hprof ...Heap dump file created [2332530344 bytes in 8.114 secs]## java.lang.OutOfMemoryError: Java heap space# -XX:OnOutOfMemoryError="gzip -f /image/core/compactor_oom.hprof"# Executing /bin/sh -c "gzip -f /image/core/compactor_oom.hprof"...Aborting due to java.lang.OutOfMemoryError: Java heap space## A fatal error has been detected by the Java Runtime Environment:## INVALID (0xe0000000) at pc=0x0000000000000000, pid=20661, tid=0x000079013c94f700# fatal error: OutOfMemory encountered: Java heap space
# ls -ltr /image/core-rw------- 1 nsx-cbm nsx-cbm 46385184 Apr 5 20:28 cbm_oom.hprof.gz-rw------- 1 uproton uproton 37 Apr 5 20:58 proton_oom.hprof.gz-rw------- 1 root root 331866016 Apr 6 17:46 compactor_oom.hprof.gzVMware NSX-T Data Center 3.x
VMware NSX
logical-migration.jar file to the /opt/vmware/upgrade-coordinator-tomcat/temp/ directory on one of the NSX Manager nodes in the cluster.service proton stopjava -Xms5g -Xmx10g -Dcorfu-property-file-path=/opt/vmware/upgrade-coordinator-tomcat/conf/ufo-factory.properties -Djava.io.tmpdir=/opt/vmware/upgrade-coordinator-tomcat/temp -DLog4jContextSelector=org.apache.logging.log4j.core.async.AsyncLoggerContextSelector -Dlog4j.configurationFile=/opt/vmware/upgrade-coordinator-tomcat/conf/log4j2.xml -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djava.util.logging.config.file=/opt/vmware/upgrade-coordinator-tomcat/conf/logging.properties -Dnsx-service-type=nsx-manager -DStaleSegmentPortBindingMapsRectifier.dryRun=false -DStaleSegmentPortBindingMapsRectifier.batchSize=10 -DStaleSegmentPortBindingMapsRectifier.maxThreads=1 -DStaleSegmentPortBindingMapsRectifier.maxTimeoutMinutes=30 -cp /opt/vmware/upgrade-coordinator-tomcat/temp/logical-migration.jar com.vmware.nsx.management.migration.impl.StaleSegmentPortBindingMapsRectifierupgrade-coordinator.log: tail -F /var/log/upgrade-coordinator/upgrade-coordinator.logupgrade-coordinator.log will show "Migration task finished."service proton start# grep -i "completed checkpoint for ########-####-####-####-##########e9" /var/log/corfu/corfu-compactor-audit.log/var/log/corfu/corfu-compactor-audit.log:2023-06-01T21:13:15.670Z INFO main CheckpointWriter - appendCheckpoint: completed checkpoint for ########-####-####-####-##########e9, entries(1131104), cpSize(1089161920) bytes at snapshot Token(epoch=1192, sequence=5314123624) in 308297 ms/var/log/corfu/corfu-compactor-audit.log:2023-06-01T21:24:54.489Z INFO main CheckpointWriter - appendCheckpoint: completed checkpoint for ########-####-####-####-##########e9, entries(30775), cpSize(31698435) bytes at snapshot Token(epoch=1192, sequence=5314265183) in 67668 ms/config usage has come down as well:# df -hFilesystem Size Used Avail Use% Mounted onudev 24G 0 24G 0% /devtmpfs 4.8G 7.5M 4.8G 1% /run/dev/sda2 11G 6.4G 3.4G 66% /tmpfs 24G 4.7M 24G 1% /dev/shmtmpfs 5.0M 0 5.0M 0% /run/locktmpfs 24G 0 24G 0% /sys/fs/cgroup/dev/sda3 11G 41M 9.7G 1% /os_bak/dev/sda1 944M 9.4M 870M 2% /boot/dev/mapper/nsx-var+dump 9.4G 37M 8.8G 1% /var/dump/dev/mapper/nsx-config__bak 29G 45M 28G 1% /config_bak/dev/mapper/nsx-repository 31G 16G 14G 53% /repository/dev/mapper/nsx-var+log 27G 9.3G 17G 37% /var/log/dev/mapper/nsx-tmp 3.7G 97M 3.4G 3% /tmp/dev/mapper/nsx-config 29G 213M 28G 1% /config/dev/mapper/nsx-image 42G 19G 22G 46% /image/dev/mapper/nsx-secondary 98G 2.7G 91G 3% /nonconfigtmpfs 4.8G 0 4.8G 0% /run/user/1007tmpfs 4.8G 0 4.8G 0% /run/user/0