2021-04-29 16:06:57.480368: Runner: Failed to run compactor tool: Command 'nice -n -10 java -XX:+UseG1GC -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xloggc:/var/log/corfu/compactor-gc.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=1M -XX:+UseStringDeduplication -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/image/core -XX:+CrashOnOutOfMemoryError -Xms1931m -Xmx1931m -Djdk.nio.maxCachedBufferSize=1048576 -Dio.netty.recycler.maxCapacityPerThread=0 -Dlog4j.configurationFile=/opt/vmware/corfu-tools/corfu-compactor-log4j2.xml -cp "/opt/vmware/corfu-tools/corfu-compactor-1.0-jar-with-dependencies.jar:/opt/vmware/policy-tomcat/webapps/policy/WEB-INF/lib/*" com.vmware.nsx.management.tools.corfu.CorfuCompactorMain -hostname ###### -port #### -namespace nsx-policy-manager -useDistributedLock' returned non-zero exit status 134
2021-06-10T15:04:12.277Z ERROR main FrameworkCorfuCompactor - - [nsx@6876 comp="nsx-manager" errorCode="MP1" level="ERROR" subcomp="corfu-compactor"] Checkpoint failed for framework data with namespace nsx-policy-manager java.lang.OutOfMemoryError: Java heap space at java.util.TreeMap.put(TreeMap.java:577) ~[?:1.8.0_212] at java.util.TreeSet.add(TreeSet.java:255) ~[?:1.8.0_212] at org.corfudb.runtime.view.stream.AddressMapStreamView$$Lambda$303/498104228.accept(Unknown Source) ~[?:?] at org.roaringbitmap.longlong.Roaring64NavigableMap$2.accept(Roaring64NavigableMap.java:456) ~[RoaringBitmap-0.7.36.jar:?] at org.roaringbitmap.RunContainer.forEach(RunContainer.java:2510) ~[RoaringBitmap-0.7.36.jar:?] at org.roaringbitmap.RoaringBitmap.forEach(RoaringBitmap.java:1609) ~[RoaringBitmap-0.7.36.jar:?] at org.roaringbitmap.longlong.Roaring64NavigableMap.forEach(Roaring64NavigableMap.java:452) ~[RoaringBitmap-0.7.36.jar:?]
2021-04-29T15:47:50.366Z | DEBUG | LogUnit-16 | o.c.i.LogUnitServer | log write: type: DATA, address: Token(epoch=242, sequence=2147483646), streams: {5d8c74bd-####-####-####-d73f9121ee62=2147483645} 2021-04-29T15:47:50.366Z | DEBUG | LogUnit-16 | o.c.i.LogUnitServer | log write: type: DATA, address: Token(epoch=242, sequence=2147483647), streams: {5d8c74bd-####-####-####-d73f9121ee62=2147483646} 2021-04-29T15:47:50.396Z | DEBUG | LogUnit-9 | o.c.i.LogUnitServer | log write: type: DATA, address: Token(epoch=242, sequence=2147483648), streams: {5d8c74bd-####-####-####-d73f9121ee62=2147483647}
Note: Above we see the bitmap integer increasing and hitting the integer.MAX_VALUE of 2147483647, this is displayed in the log as the sequence number and above we see that increased from 2147483646 in the first entry to 2147483648 in the last entry, passing the value 2147483647.
/image/core/*.hprof
files are created due to the compactor process continually going out of memory, each time is does this it creates a dump file (*.hprof) in the /image/core/
directory.INFO main CheckpointWriter - appendCheckpoint: completed checkpoint for 4251f216-####-####-####-9fe0df2243ee, entries(1), cpSize(1164) bytes at snapshot Token(epoch=101, sequence=2147483645 ) in 76 ms
INFO main CheckpointWriter - appendCheckpoint: completed checkpoint for 4251f216-####-####-####-9fe0df2243ee, entries(1), cpSize(1164) bytes at snapshot Token(epoch=101, sequence=2147483645 ) in 76 ms
INFO main CheckpointWriter - appendCheckpoint: completed checkpoint for 4251f216-####-####-####-9fe0df2243ee, entries(1), cpSize(1164) bytes at snapshot Token(epoch=101, sequence=2147483646) in 76 ms
INFO main CheckpointWriter - appendCheckpoint: completed checkpoint for 4251f216-####-####-####-9fe0df2243ee, entries(1), cpSize(1164) bytes at snapshot Token(epoch=101, sequence=2147483647) in 76 ms
Note: It may not be the same table, just that it is entering the same address space in consecutive order.
This is resolved in NSX-T version 3.1.2.1
To work around this issue, contact Broadcom Support and note this Article ID (367597) in the problem description.