NSX Manager CORFU_NONCONFIG flapping with DB_SYNCING & Down status
search cancel

NSX Manager CORFU_NONCONFIG flapping with DB_SYNCING & Down status

book

Article ID: 436627

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

 

  • NSX Manager cluster nodes's CORFU_NONCONFIG  exhibit persistent DB_SYNCING status and DOWN status

  • Database synchronization between the primary node and secondary nodes is failing to complete.

  • /var/log/corfu-nonconfig/corfu.9040.log  contain a high "upper" number for corfu fsync. If this upper number is over 250000, it indicates high latency:]
<timestamp> | | DEBUG |      logging-metrics-publisher | org.corfudb.client.metricsdata | failure-detector_ping-latency,id=c8f2051a-2243-4cda-a6dc-8de61ea7e26e,node=144.215.4.82:9040,metric_type=timer sum=6019988.469,count=26,mean=231538.018038,upper=1923807.666 1775721884462
<timestamp> | DEBUG | failAfter-0 | o.c.r.c.NettyClientRouter | sendRequestAndGetCompletable: Remove request <REQUEST_ID> to <IP_ADDRESS>:<PORT> due to timeout! Request:version { corfu_source_code_version: <VERSION_ID> } request_id: <REQUEST_ID> priority: HIGH epoch: <EPOCH_VALUE> cluster_id { lsb: <MASKED_ID> msb: <MASKED_ID> } client_id { lsb: <MASKED_ID> msb: <MASKED_ID> } ignore_cluster_id: true ignore_epoch: true
  • In this scenario, the database experienced 1923807.666 ms of latency, which is such a high number for a latency-sensitive database

 

Environment

VMware NSX

Cause

Corfu database operations timeout when latency exceeds the critical threshold , average disk access latency to be less than 10ms.

Resolution

The datastore read/write latency must be resolved.

Alternatively, we can perform a Storage vMotion to migrate the NSX Manager VMs to a datastore maintaining a consistent latency of under 10ms.

Per the NSX Storage Requirements:

"NSX appliance VMs backed by vSAN clusters may experience intermittent disk write latency spikes of 10ms+ due to standard vSAN I/O handling. However, provided the average latency remains below 10ms, these intermittent spikes should not impact the NSX Appliance VMs."


NSX Manager VM and Host Transport Node System Requirements