You have had a datastore outage on the ESXi's hosts where the NSX-T managers reside.
deactivate cluster
' command and now you have a single node cluster.get cluster status
' shows everything up except: CORFU_NONCONFIG which is in an UNKOWN state:Group Type: CORFU_NONCONFIG
Group Status: UNAVAILABLE
Members:
UUID FQDN IP STATUS
206dea88-####-####-####-###########de nsx-03.example.local ##.##.##.203 UNKNOWN
/nonconfig/corfu/corfu/LAYOUT_CURRENT.ds - shows the other 2 nodes are still present.
"layoutServers": [
"##.##.##.201:9040", ##########this node was removed
"##.##.##.202:9040", ##########this node was removed
"##.##.##.203:9040"
],
"sequencers": [
"##.##.##.201:9040", ##########this node was removed
"##.##.##.202:9040", ##########this node was removed
"##.##.##.203:9040"
],
"segments": [
{
"replicationMode": "CHAIN_REPLICATION",
"start": 0,
"end": -1,
"stripes": [
{
"logServers": [
"##.##.##.202:9040", ##########this node was removed
"##.##.##.201:9040" ##########this node was removed
]
}
]
}
],
"unresponsiveServers": [
"##.##.##.203:9040"
],
"epoch": 1825,
"clusterId": "ff44209a-####-####-####-##########86"
get cluster config
' on the remaining node, it correctly shows only the single remaining node.2022-05-31T14:59:51.264Z | DEBUG | CorfuRuntime-0 | o.corfudb.runtime.CorfuRuntime | Layout server ##.##.##.203:9040 responded with layout Layout(layoutServers=[##.##.##.201:9040, ##.##.##.202:9040, ##.##.##.203:9040], sequencers=[##.##.##.202:9040, ##.##.##.201:9040, ##.##.##.203:9040], segments=[Layout.LayoutSegment(replicationMode=CHAIN_REPLICATION, start=0, end=-1, stripes=[Layout.LayoutStripe(logServers=[##.##.##.202:9040, ##.##.##.201:9040])])], unresponsiveServers=[##.##.##.203:9040], epoch=1725, clusterId=ff44209a-####-####-####-##########86)
2022-05-31T14:59:51.264Z | INFO | initializationTaskThread | o.c.i.RecoveryHandler | Recovery layout epoch:1725, Cluster epoch: 1725
2022-05-31T14:59:51.264Z | ERROR | initializationTaskThread | o.c.i.ManagementAgent | initializationTask: Recovery failed 1364 times. Retrying in PT1Ss.
2022-05-31T14:59:52.262Z | DEBUG | client-5 | o.c.r.c.NettyClientRouter | connectAsync[##.##.##.202:9040]: Channel connection failed, reconnecting...
2022-05-31T14:59:52.265Z | DEBUG | initializationTaskThread | o.c.runtime.view.RuntimeLayout | Requested move of servers to new epoch 1726 servers are [##.##.##.203:9040, ##.##.##.202:9040, ##.##.##.201:9040]
2022-05-31T14:59:52.265Z | INFO | initializationTaskThread | o.c.runtime.clients.BaseClient | sealRemoteServer: send SEAL from me(clientId=null) to new epoch 1726
...
2022-05-31T14:59:52.464Z | DEBUG | client-6 | o.c.r.c.NettyClientRouter | connectAsync[##.##.##.201:9040]: Channel connection failed, reconnecting...
...
2022-05-31T14:59:53.265Z | DEBUG | initializationTaskThread | o.c.r.v.QuorumFuturesFactory | QuorumGet: Exception TimeoutException
2022-05-31T14:59:53.265Z | ERROR | initializationTaskThread | o.c.r.v.LayoutManagementView | Error: recovery: {}
org.corfudb.runtime.exceptions.QuorumUnreachableException: Couldn't reach quorum, reachable=1, required=2
at
...
2022-05-31T14:59:53.265Z | INFO | initializationTaskThread | o.c.i.RecoveryHandler | Recovery reconfiguration attempt result: false
2022-05-31T14:59:53.763Z | DEBUG | client-7 | o.c.r.c.NettyClientRouter | connectAsync[##.##.##.202:9040]: Channel connection failed, reconnecting...
2022-05-31T14:59:53.766Z | WARN | CorfuRuntime-0 | o.corfudb.runtime.CorfuRuntime | Tried to get layout from ##.##.##.202:9040 but failed by timeout
/var/log/corfu-nonconfig/nonconfig-corfu-compactor-audit.log
2022-05-31T14:11:10.601Z WARN CorfuRuntime-0 CorfuRuntime - Tried to get layout from ##.##.##.201:9040 but failed by timeout
2022-05-31T14:11:11.102Z WARN CorfuRuntime-0 CorfuRuntime - Tried to get layout from ##.##.##.202:9040 but failed by timeout
2022-05-31T14:11:31.203Z ERROR main UfoCompactor - - [nsx@6876 comp="nsx-manager" errorCode="MP2" level="ERROR" subcomp="corfu-compactor"] UFO: Trim failed for ufo data in namespace ufo
org.corfudb.runtime.exceptions.UnreachableClusterException: Cluster is unavailable
at com.vmware.nsx.platform.ufo.CorfuRuntimeHelper$1.run(CorfuRuntimeHelper.java:43) ...
'deactivate cluster'
command to attempt to recover the cluster.deactivate cluster
command was executed, was an unresponsive nodes at the time.deactivate cluster
command was issued, it was unhealthy even before the deactivate cluster
command was issued.