Corfu non-config compactor fails continuously after upgrading from NSX-T 3.1.3 to 3.2.0 with a SerializerException.
book
Article ID: 345858
calendar_today
Updated On:
Products
VMware NSX
Issue/Introduction
Alarm for /nonconfig partition disk usage high and very high are reported in the NSX Manager UI in the Alarms listing and can also be seen in System -> Appliances on the NSX Managers.
Corfu non-config compactor fails continuously after upgrade from NSX-T 3.1.3 to 3.2.0.0 with a SerializerException: DynamicProtobufSerializer file google/protobuf/descriptor.proto was never seen in registry.
This error can be found in /var/log/corfu-nonconfig/nonconfig-corfu-compactor-audit* logs of the NSX Manager.
The /nonconfig partition will also slowly increase over time. This can be observed by running the df -h command on the UA as a root user
If the environment has been upgraded further to 4.0.0 or onwards without addressing this compactor failure, the error first described above from /var/log/corfu-nonconfig/nonconfig-corfu-compactor-audit* logs will become a NullPointerException that looks like this:
ERROR | Cmpt-chkpter-9040 | o.c.r.object.CorfuCompileProxy | Access[CorfuTable[59ce]]
java.lang.NullPointerException: null
at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936)
at java.util.concurrent.ConcurrentHashMap.containsKey(ConcurrentHashMap.java:964)
at org.corfudb.util.serializer.DynamicProtobufSerializer.getDescriptor(DynamicProtobufSerializer.java:265)
at org.corfudb.util.serializer.KeyDynamicProtobufSerializer.deserialize(KeyDynamicProtobufSerializer.java:60)
at org.corfudb.protocols.logprotocol.SMREntry.deserializeBuffer(SMREntry.java:138)
at org.corfudb.protocols.logprotocol.LogEntry.deserialize(LogEntry.java:83)
at org.corfudb.protocols.logprotocol.MultiSMREntry.deserializeBuffer(MultiSMREntry.java:70)
at org.corfudb.protocols.logprotocol.LogEntry.deserialize(LogEntry.java:83)
at org.corfudb.protocols.logprotocol.CheckpointEntry.getSmrEntries(CheckpointEntry.java:180)
at org.corfudb.protocols.logprotocol.CheckpointEntry.getSmrEntries(CheckpointEntry.java:170)
at org.corfudb.runtime.object.StreamViewSMRAdapter.dataAndCheckpointMapper(StreamViewSMRAdapter.java:55)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at org.corfudb.runtime.view.stream.StreamSpliterator.tryAdvance(StreamSpliterator.java:65)
at java.util.Spliterator.forEachRemaining(Spliterator.java:326)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.forEachOrdered(ReferencePipeline.java:490)
at org.corfudb.runtime.object.VersionLockedObject.lambda$syncStreamUnsafe$5(VersionLockedObject.java:707)
at org.corfudb.common.metrics.micrometer.MicroMeterUtils.time(MicroMeterUtils.java:115)
at org.corfudb.runtime.object.VersionLockedObject.syncStreamUnsafe(VersionLockedObject.java:726)
at org.corfudb.runtime.object.VersionLockedObject.syncObjectUnsafeInner(VersionLockedObject.java:383)
at org.corfudb.runtime.object.VersionLockedObject.syncObjectUnsafe(VersionLockedObject.java:346)
at org.corfudb.runtime.object.transactions.AbstractTransactionalContext.syncWithRetryUnsafe(AbstractTransactionalContext.java:237)
at org.corfudb.runtime.object.transactions.SnapshotTransactionalContext.lambda$access$1(SnapshotTransactionalContext.java:45)
at org.corfudb.runtime.object.VersionLockedObject.access(VersionLockedObject.java:269)
at org.corfudb.runtime.object.transactions.SnapshotTransactionalContext.access(SnapshotTransactionalContext.java:42)
at org.corfudb.runtime.object.CorfuCompileProxy.accessInner(CorfuCompileProxy.java:167)
at org.corfudb.runtime.object.CorfuCompileProxy.lambda$access$0(CorfuCompileProxy.java:158)
at io.micrometer.core.instrument.composite.CompositeTimer.record(CompositeTimer.java:57)
at org.corfudb.common.metrics.micrometer.MicroMeterUtils.lambda$time$6(MicroMeterUtils.java:121)
at java.util.Optional.map(Optional.java:215)
at org.corfudb.common.metrics.micrometer.MicroMeterUtils.time(MicroMeterUtils.java:121)
at org.corfudb.runtime.object.CorfuCompileProxy.access(CorfuCompileProxy.java:158)
at org.corfudb.runtime.collections.CorfuTable$CORFUSMR.entryStream(CorfuTable$CORFUSMR.java:214)
at org.corfudb.runtime.CheckpointWriter.appendCheckpoint(CheckpointWriter.java:181)
at org.corfudb.runtime.CheckpointWriter.appendCheckpoint(CheckpointWriter.java:154)
at org.corfudb.runtime.DistributedCheckpointer.appendCheckpoint(DistributedCheckpointer.java:175)
at org.corfudb.runtime.DistributedCheckpointer.tryCheckpointTable(DistributedCheckpointer.java:140)
at org.corfudb.runtime.ServerTriggeredCheckpointer.checkpointTables(ServerTriggeredCheckpointer.java:45)
at org.corfudb.compactor.CompactorCheckpointer.startCheckpointing(CompactorCheckpointer.java:71)
at org.corfudb.compactor.CompactorCheckpointer.main(CompactorCheckpointer.java:59)
This NullPointerException will also be accompanied by errors in NSX Manager /var/log/corfu-nonconfig/nonconfig-corfu-compactor-audit.log:
ERROR | Cmpt-chkpter-9040 | s.KeyDynamicProtobufSerializer | messagesFdProtoNameMap doesn't contain the message type vmware.nsx.context.ids.IdsSignatureDetailUfoKeyMsg of payload type_url: "type.googleapis.com/vmware.nsx.context.ids.IdsSignatureDetailUfoKeyMsg" value: "\b\200\206{\022\aDEFAULT". Please check if the related table is properly opened with correct schema.
ERROR | Cmpt-chkpter-9040 | s.KeyDynamicProtobufSerializer | messagesFdProtoNameMap doesn't contain the message type vmware.nsx.context.ids.IdsSignatureDetailUfoKeyMsg of payload type_url: "type.googleapis.com/vmware.nsx.context.ids.IdsSignatureDetailUfoKeyMsg"
These errors will only be seen for the two tables: tables ids_signature_detail_ufo and ids_signature_detail_ufo_version.
Note: Customers upgrading directly from 3.1.3 to 3.2.0.1 or 3.2.1 will not see this issue.
Note: It's possible that an upgrade to 3.2 is performed and no immediate impact is seen. The underlying issue will still be present and may manifest on newer versions of 3.2.x or 4.x. In this case a manual fix will be required to resolve issue.
Environment
VMware NSX-T Data Center 3.x VMware NSX
Cause
The initial failure occurs when upgrading to 3.2.0.0, due to a known issue. This is fixed in 3.2.0.1 by running compaction manually with the flags '-upgrade -upgradeToImpactor’. However, since this requires manual invocation coming from 3.2.0 to 3.2.0.1, it can be missed. In 4.0.0 onwards, Corfu detects schema changes in a different manner and the relevant descriptors and metadata information are updated for the tables that are opened. However, since the tables ids_signature_detail_ufo and ids_signature_detail_ufo_version are no longer used, they remain in the RegistryTable with the old information. This now causes the compactor to fail with the NullPointerException described above when compaction attempts to checkpoint either of these tables.
Resolution
This issue is resolved in VMware NSX 3.2.0.1 This issue is resolved in VMware NSX 4.2.0
Workaround:
Contact Broadcom Support for assistance with working around this issue.