/var/log/corfu/corfu-compactor-audit.log
:2021-04-29 16:06:57.480368: Runner: Failed to run compactor tool: Command 'nice -n -10 java -XX:+UseG1GC -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xloggc:/var/log/corfu/compactor-gc.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=1M -XX:+UseStringDeduplication -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/image/core -XX:+CrashOnOutOfMemoryError -Xms1931m -Xmx1931m -Djdk.nio.maxCachedBufferSize=1048576 -Dio.netty.recycler.maxCapacityPerThread=0 -Dlog4j.configurationFile=/opt/vmware/corfu-tools/corfu-compactor-log4j2.xml -cp "/opt/vmware/corfu-tools/corfu-compactor-1.0-jar-with-dependencies.jar:/opt/vmware/policy-tomcat/webapps/policy/WEB-INF/lib/*" com.vmware.nsx.management.tools.corfu.CorfuCompactorMain -hostname <IP address> -port 9000 -namespace nsx-policy-manager -useDistributedLock' returned non-zero exit status 134
/var/log/corfu/corfu-compactor-audit.log
resemble:2021-06-10T15:04:12.277Z ERROR main FrameworkCorfuCompactor - - [nsx@6876 comp="nsx-manager" errorCode="MP1" level="ERROR" subcomp="corfu-compactor"] Checkpoint failed for framework data with namespace nsx-policy-manager java.lang.OutOfMemoryError: Java heap space at java.util.TreeMap.put(TreeMap.java:577) ~[?:1.8.0_212] at java.util.TreeSet.add(TreeSet.java:255) ~[?:1.8.0_212] at org.corfudb.runtime.view.stream.AddressMapStreamView$$Lambda$303/498104228.accept(Unknown Source) ~[?:?] at org.roaringbitmap.longlong.Roaring64NavigableMap$2.accept(Roaring64NavigableMap.java:456) ~[RoaringBitmap-0.7.36.jar:?] at org.roaringbitmap.RunContainer.forEach(RunContainer.java:2510) ~[RoaringBitmap-0.7.36.jar:?] at org.roaringbitmap.RoaringBitmap.forEach(RoaringBitmap.java:1609) ~[RoaringBitmap-0.7.36.jar:?] at org.roaringbitmap.longlong.Roaring64NavigableMap.forEach(Roaring64NavigableMap.java:452) ~[RoaringBitmap-0.7.36.jar:?]
/image/
may be 100% full, in /image/core/
we can see a large number of *.hprof
files./var/log/corfu/corfu.9000.*.log
2021-04-29T15:47:50.366Z | DEBUG | LogUnit-16 | o.c.i.LogUnitServer | log write: type: DATA, address: Token(epoch=242, sequence=2147483646), streams: {5d8c74bd-####-####-####-d73f9121ee62=2147483645}
2021-04-29T15:47:50.366Z | DEBUG | LogUnit-16 | o.c.i.LogUnitServer | log write: type: DATA, address: Token(epoch=242, sequence=2147483647), streams: {5d8c74bd-####-####-####-d73f9121ee62=2147483646}
2021-04-29T15:47:50.396Z | DEBUG | LogUnit-9 | o.c.i.LogUnitServer | log write: type: DATA, address: Token(epoch=242, sequence=2147483648), streams: {5d8c74bd-####-####-####-d73f9121ee62=2147483647}
/image/core/*.hprof
files are created due to the compactor process continually going out of memory, each time is does this it creates a dump file (*.hprof
) in the /image/core/
directory.integer.MAX_VALU
E
of 2147483647.4251f216-####-####-####-9fe0df2243ee
before hitting sequence 2147483647 (integer.MAX_VALUE
):INFO main CheckpointWriter - appendCheckpoint: completed checkpoint for 4251f216-####-####-####-9fe0df2243ee, entries(1), cpSize(1164) bytes at snapshot Token(epoch=101, sequence=2147483645 ) in 76 ms
INFO main CheckpointWriter - appendCheckpoint: completed checkpoint for 4251f216-####-####-####
-9fe0df2243ee, entries(1), cpSize(1164) bytes at snapshot Token(epoch=101, sequence=2147483645 ) in 76 ms
INFO main CheckpointWriter - appendCheckpoint: completed checkpoint for 4251f216-####-####-####
-9fe0df2243ee, entries(1), cpSize(1164) bytes at snapshot Token(epoch=101, sequence=2147483646) in 76 ms
INFO main CheckpointWriter - appendCheckpoint: completed checkpoint for 4251f216-####-####-####
-9fe0df2243ee, entries(1), cpSize(1164) bytes at snapshot Token(epoch=101, sequence=2147483647) in 76 ms
This is resolved in NSX-T version 3.0.3.1 and 3.1.2.1
NSX-T 3.0.3.2 lacks the fix.
Workaround:
To work around this issue, contact Broadcom Support and note this Article ID (317760) in the problem description.