VMware NSX 4.x "Cluster Degraded" alarm is generated
search cancel

VMware NSX 4.x "Cluster Degraded" alarm is generated

book

Article ID: 373240

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • The NSX Cluster triggers an alarm in the alarms section of "Cluster Degraded" even though all the services are up and running on every cluster node
/var/log/syslog
2024-07-16T12:11:33.875Z nsx3 NSX 3475508 MONITORING [nsx@6876 alarmId="f150825a-c415-4780-a933-5f83227c020a" alarmState="OPEN" comp="nsx-manager" entId="xxxxxxxx-049f-3168-abb4-xxxxxxxx" eventFeatureName="clustering" eventSev="MEDIUM" eventState="On" eventType="cluster_degraded" level="WARNING" nodeId="xxxxxxxx-1bd9-c0fa-26b8-xxxxxxxx" subcomp="monitoring"] Group member xxxxxxxx-3266-45c9-b13a-xxxxxxxx of service CONTROLLER is down.

.

  • All the services are up and stable via CLI:
"get cluster status"
IP: 192.168.0.79
API Leader: xxxxxxxx-2808-4005-bcf1-xxxxxxxx = nsx3 = 192.168.0.78: UP
Service install-upgrade is enabled on 192.168.0.78
Cluster:
CLUSTER_BOOT_MANAGER  STABLE  nsx1 = 192.168.0.76: UP  nsx2 = 192.168.0.77: UP  nsx3 = 192.168.0.78: UP
CM-INVENTORY ________ STABLE  nsx1 = 192.168.0.76: UP  nsx2 = 192.168.0.77: UP  nsx3 = 192.168.0.78: UP
CONTROLLER __________ STABLE  nsx1 = 192.168.0.76: UP  nsx2 = 192.168.0.77: UP  nsx3 = 192.168.0.78: UP
CORFU_NONCONFIG _____ STABLE  nsx1 = 192.168.0.76: UP  nsx2 = 192.168.0.77: UP  nsx3 = 192.168.0.78: UP
DATASTORE ___________ STABLE  nsx1 = 192.168.0.76: UP  nsx2 = 192.168.0.77: UP  nsx3 = 192.168.0.78: UP
HTTPS _______________ STABLE  nsx1 = 192.168.0.76: UP  nsx2 = 192.168.0.77: UP  nsx3 = 192.168.0.78: UP
IDPS_REPORTING ______ STABLE  nsx1 = 192.168.0.76: UP  nsx2 = 192.168.0.77: UP  nsx3 = 192.168.0.78: UP
MANAGER _____________ STABLE  nsx1 = 192.168.0.76: UP  nsx2 = 192.168.0.77: UP  nsx3 = 192.168.0.78: UP
MESSAGING-MANAGER ___ STABLE  nsx1 = 192.168.0.76: UP  nsx2 = 192.168.0.77: UP  nsx3 = 192.168.0.78: UP
MONITORING __________ STABLE  nsx1 = 192.168.0.76: UP  nsx2 = 192.168.0.77: UP  nsx3 = 192.168.0.78: UP
SITE_MANAGER ________ STABLE  nsx1 = 192.168.0.76: UP  nsx2 = 192.168.0.77: UP  nsx3 = 192.168.0.78: UP

 

Environment

VMware NSX 4.x

Cause

This is a known issue impacting VMware NSX.

Resolution

Identify the Node that reported the Alarm

  1. Access the UI -> Alarms
  2. Expand the alarm
  3. Look for the field "Reported by Node" and take note of the node name

 

Restart the CBM service

  1. Access via SSH the Node identified in the previous step
  2. List the services -> get cluster status
  3. Restart the CBM service -> /etc/init.d/nsx-cluster-boot-manager restart