Corfu non-config compactor fails continuously after upgrading from NSX-T 3.1.3 to 3.2.0 with a SerializerException.
search cancel

Corfu non-config compactor fails continuously after upgrading from NSX-T 3.1.3 to 3.2.0 with a SerializerException.

book

Article ID: 345858

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • Corfu non-config compactor fails continuously after upgrade from NSX-T 3.1.3 to 3.2.0.0 with a SerializerException: DynamicProtobufSerializer file google/protobuf/descriptor.proto was never seen in registry.
  • This error can be found in /var/log/corfu-nonconfig/corfu-compactor-audit* logs of the UA.
  • Note: Customers upgrading directly from 3.1.3 to 3.2.0.1 or 3.2.1 will not see this issue.
  • The /nonconfig partition will also slowly increase over time. This can be observed by running the df -h command on the UA as a root user

    /dev/mapper/nsx-secondary     98G   16G   78G  17% /nonconfig
  • If the environment has been upgraded further to 4.0.0 or onwards without addressing this compactor failure, the error first described above from /var/log/corfu-nonconfig/corfu-compactor-audit* logs will become a NullPointerException that looks like this:
ERROR |              Cmpt-chkpter-9040 | o.c.r.object.CorfuCompileProxy | Access[CorfuTable[59ce]]

java.lang.NullPointerException: null
        at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936)
        at java.util.concurrent.ConcurrentHashMap.containsKey(ConcurrentHashMap.java:964)
        at org.corfudb.util.serializer.DynamicProtobufSerializer.getDescriptor(DynamicProtobufSerializer.java:265)
        at org.corfudb.util.serializer.KeyDynamicProtobufSerializer.deserialize(KeyDynamicProtobufSerializer.java:60)
        at org.corfudb.protocols.logprotocol.SMREntry.deserializeBuffer(SMREntry.java:138)
        at org.corfudb.protocols.logprotocol.LogEntry.deserialize(LogEntry.java:83)
        at org.corfudb.protocols.logprotocol.MultiSMREntry.deserializeBuffer(MultiSMREntry.java:70)
        at org.corfudb.protocols.logprotocol.LogEntry.deserialize(LogEntry.java:83)
        at org.corfudb.protocols.logprotocol.CheckpointEntry.getSmrEntries(CheckpointEntry.java:180)
        at org.corfudb.protocols.logprotocol.CheckpointEntry.getSmrEntries(CheckpointEntry.java:170)
        at org.corfudb.runtime.object.StreamViewSMRAdapter.dataAndCheckpointMapper(StreamViewSMRAdapter.java:55)
        at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
        at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
        at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
        at org.corfudb.runtime.view.stream.StreamSpliterator.tryAdvance(StreamSpliterator.java:65)
        at java.util.Spliterator.forEachRemaining(Spliterator.java:326)
        at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
        at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
        at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
        at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
        at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at java.util.stream.ReferencePipeline.forEachOrdered(ReferencePipeline.java:490)
        at org.corfudb.runtime.object.VersionLockedObject.lambda$syncStreamUnsafe$5(VersionLockedObject.java:707)
        at org.corfudb.common.metrics.micrometer.MicroMeterUtils.time(MicroMeterUtils.java:115)
        at org.corfudb.runtime.object.VersionLockedObject.syncStreamUnsafe(VersionLockedObject.java:726)
        at org.corfudb.runtime.object.VersionLockedObject.syncObjectUnsafeInner(VersionLockedObject.java:383)
        at org.corfudb.runtime.object.VersionLockedObject.syncObjectUnsafe(VersionLockedObject.java:346)
        at org.corfudb.runtime.object.transactions.AbstractTransactionalContext.syncWithRetryUnsafe(AbstractTransactionalContext.java:237)
        at org.corfudb.runtime.object.transactions.SnapshotTransactionalContext.lambda$access$1(SnapshotTransactionalContext.java:45)
        at org.corfudb.runtime.object.VersionLockedObject.access(VersionLockedObject.java:269)
        at org.corfudb.runtime.object.transactions.SnapshotTransactionalContext.access(SnapshotTransactionalContext.java:42)
        at org.corfudb.runtime.object.CorfuCompileProxy.accessInner(CorfuCompileProxy.java:167)
        at org.corfudb.runtime.object.CorfuCompileProxy.lambda$access$0(CorfuCompileProxy.java:158)
        at io.micrometer.core.instrument.composite.CompositeTimer.record(CompositeTimer.java:57)
        at org.corfudb.common.metrics.micrometer.MicroMeterUtils.lambda$time$6(MicroMeterUtils.java:121)
        at java.util.Optional.map(Optional.java:215)
        at org.corfudb.common.metrics.micrometer.MicroMeterUtils.time(MicroMeterUtils.java:121)
        at org.corfudb.runtime.object.CorfuCompileProxy.access(CorfuCompileProxy.java:158)
        at org.corfudb.runtime.collections.CorfuTable$CORFUSMR.entryStream(CorfuTable$CORFUSMR.java:214)
        at org.corfudb.runtime.CheckpointWriter.appendCheckpoint(CheckpointWriter.java:181)
        at org.corfudb.runtime.CheckpointWriter.appendCheckpoint(CheckpointWriter.java:154)
        at org.corfudb.runtime.DistributedCheckpointer.appendCheckpoint(DistributedCheckpointer.java:175)
        at org.corfudb.runtime.DistributedCheckpointer.tryCheckpointTable(DistributedCheckpointer.java:140)
        at org.corfudb.runtime.ServerTriggeredCheckpointer.checkpointTables(ServerTriggeredCheckpointer.java:45)
        at org.corfudb.compactor.CompactorCheckpointer.startCheckpointing(CompactorCheckpointer.java:71)
        at org.corfudb.compactor.CompactorCheckpointer.main(CompactorCheckpointer.java:59)
  • This NPE will also be accompanied by errors:

    ERROR |              Cmpt-chkpter-9040 | s.KeyDynamicProtobufSerializer | messagesFdProtoNameMap doesn't contain the message type vmware.nsx.context.ids.IdsSignatureDetailUfoKeyMsg of payload type_url: "type.googleapis.com/vmware.nsx.context.ids.IdsSignatureDetailUfoKeyMsg" value: "\b\200\206{\022\aDEFAULT". Please check if the related table is properly opened with correct schema.

  • These errors will only be seen for the two tables: tables ids_signature_detail_ufo and ids_signature_detail_ufo_version.



Environment

VMware NSX-T Data Center 3.x
VMware NSX

Cause

The initial failure occurs when upgrading to 3.2.0.0, due to a known issue. This is fixed in 3.2.0.1 by running compaction manually with the flags '-upgrade -upgradeToImpactor’. However, since this requires manual invocation coming from 3.2.0 to 3.2.0.1, it can be missed. In 4.0.0 onwards, Corfu detects schema changes in a different manner and the relevant descriptors and metadata information are updated for the tables that are opened. However, since the tables ids_signature_detail_ufo and ids_signature_detail_ufo_version are no longer used, they remain in the RegistryTable with the old information. This now causes the compactor to fail with the NPE described above when compaction attempts to checkpoint either of these tables.

Resolution

This issue is resolved in VMware NSX 3.2.0.1
This issue is resolved in VMware NSX 4.2.0

Workaround:

Contact Broadcom Support for assistance with working around this issue.