NSX-T Edge Node reporting false alarm that dataplane has stopped
search cancel

NSX-T Edge Node reporting false alarm that dataplane has stopped

book

Article ID: 322527

calendar_today

Updated On:

Products

VMware NSX Networking

Issue/Introduction

Symptoms:
  • You are running NSX-T version 3.1.x or have recently upgraded to one of these versions.
  • You observe similar alarms in the NSX-T UI that the service status has changed for an NSX-T Edge Node.
  • You observe no functional impact to the dataplane service on the NSX-T Edge Node.
  • You may see similar entries found in /var/log/syslog on the NSX-T Edge Node.

2023-05-15T18:51:06.863Z NSX-edge01 NSX 1873 - [nsx@6876 comp="nsx-edge" subcomp="node-mgmt" username="root" level="WARNING" eventFeatureName="infrastructure_service" eventType="edge_service_status_changed" eventSev="warning" eventState="On"] The service dataplane changed from STARTED to STOPPED.

2023-05-15T18:53:08.654Z NSX-edge01 1873 - [nsx@6876 comp="nsx-edge" subcomp="node-mgmt" username="root" level="WARNING" eventFeatureName="infrastructure_service" eventType="edge_service_status_changed" eventSev="warning" eventState="Off"] The service dataplane changed from STOPPED to STARTED.

  • When the Dataplane service is checked using the command systemctl status nsx-edge-datapath from root user on the NSX-T Edge Node CLI the service has been running longer than the alarms.
 nsx-edge-datapath.service - Edge Datapath
   Loaded: loaded (/lib/systemd/system/nsx-edge-datapath.service; enabled; vendor preset: enabled)
   Active: active (running) since Thu 2023-05-18 11:12:27 UTC; 3 days ago
 Main PID: 3628 (datapath-system)
    Tasks: 10 (limit: 4371)
   CGroup: /system.slice/nsx-edge-datapath.service
           |-3628 /bin/bash /opt/vmware/nsx-edge/bin/datapath-systemd-helper start
           `-3634 /usr/bin/docker start -a service_datapath


Environment

VMware NSX-T Data Center

Cause

If the nsx-edge-exporter temporarily forks another process, the get_pid for exporter will return multiple pids for child and parent and currently it picks up the first one.
If this is child process, the script will detect pid change from previous check and raise a false alarm.

Resolution

This issue is resolved in NSX-T version 3.2.

Workaround:
There is no work around to clear these false alarms. Please upgrade NSX-T to resolve this issue.