NSX-T Edge Node reporting false alarm that dataplane has stopped
search cancel

NSX-T Edge Node reporting false alarm that dataplane has stopped

book

Article ID: 322527

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • You are running NSX-T 3.1.x or have recently upgraded to one of these versions.
  • You observe alarms in the NSX-T UI indicating that the dataplane service status has changed for an NSX-T Edge Node.
  • You observe no functional impact to the dataplane service on the NSX-T Edge Node.
  • You may see log entries similiar to the below on the NSX-T Edge Node :
    /var/log/syslog
    NSX-edge01 NSX 1873 - [nsx@6876 comp="nsx-edge" subcomp="node-mgmt" username="root" level="WARNING" eventFeatureName="infrastructure_service" eventType="edge_service_status_changed" eventSev="warning" eventState="On"] The service dataplane changed from STARTED to STOPPED.
    
    NSX-edge01 1873 - [nsx@6876 comp="nsx-edge" subcomp="node-mgmt" username="root" level="WARNING" eventFeatureName="infrastructure_service" eventType="edge_service_status_changed" eventSev="warning" eventState="Off"] The service dataplane changed from STOPPED to STARTED.
  • Running the below command from NSX-T Edge Node CLI (root mode) confirms that the dataplane service has been running longer than the alarm timestamps:
    systemctl status nsx-edge-datapath
    nsx-edge-datapath.service - Edge Datapath
       Loaded: loaded (/lib/systemd/system/nsx-edge-datapath.service; enabled; vendor preset: enabled)
       Active: active (running) since <Day> <Date> <Time> UTC; # days ago
     Main PID: 3628 (datapath-system)
        Tasks: 10 (limit: 4371)
       CGroup: /system.slice/nsx-edge-datapath.service
               |-3628 /bin/bash /opt/vmware/nsx-edge/bin/datapath-systemd-helper start
               `-3634 /usr/bin/docker start -a service_datapath
 

Environment

VMware NSX-T Data Center

Cause

If the nsx-edge-exporter temporarily forks another process, the get_pid for exporter will return multiple pids for child and parent and currently it picks up the first one.
If this is child process, the script will detect pid change from previous check and raise a false alarm.

Resolution

This issue is resolved in VMware NSX 3.2, available at Broadcom downloads.

If you are having difficulty finding and downloading software, please review the Download Broadcom products and software KB.

Workaround:
There is no work around to clear these false alarms.