Edge nodes dataplane service is down after exiting maintenance mode when using Service Insertion
search cancel

Edge nodes dataplane service is down after exiting maintenance mode when using Service Insertion

book

Article ID: 322588

calendar_today

Updated On:

Products

VMware NSX Networking

Issue/Introduction

Symptoms:
  • NSX-T versions 4.X.
  • North South service insertion is deployed and in use.
  • The edge node was placed into maintenance mode and then exited.
  • Management plane connectivity from the NSX-T manager to the edge node is not working.
  • Traffic flowing through the edge node on logical routers are impacted.
  • The 'dataplane' service on the edge node is down.
get service dataplane
Service name:      dataplane
Service state:     stopped
  • In the NSX-T edge log '/var/log/syslog' we can see the following ERROR:
2023-06-28T15:52:46.966Z nsxedge-21193684-1-2301281756346299142024 NSX 81301 FABRIC [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="dpc-pb" tname="dp-ipc23" level="ERROR" errorCode="EDG0400536"] Service port fcea8386-0d78-457d-905a-55f9d9119942 not found
2023-06-28T15:52:46.062Z nsxedge-21193684-1-2301281756346299142024 7bb730031118 79853 - - 2023-06-28T15:52:46Z datapathd 81301 dpc-pb tname="dp-ipc23" [ERROR] Service port fcea8386-0d78-457d-905a-55f9d9119942 not found errorCode="EDG0400536"


Environment

VMware NSX-T Data Center
VMware NSX-T Data Center 4.x

Cause

This is occurs when CCP's (Central Control Plane) internal deletion of Service Insertion Redirection Policies is not working properly, leading to stale 'SiRedirectionPolicyMsg' objects on the Edge node.

Resolution

This is a known issue impacting VMware NSX.

Workaround:
Perform the following steps on the NSX Manager shell as root user, one by one:
  1. Reboot all 3 NSX manager's:
    • root@nsx-manager:~# reboot
  2. Once the manager reboot has completed, run the following command as root user to restart the mpa service:
    • root@ns-xmanager:~# /etc/init.d/nsx-mpa restart 
Note: Before rebooting the NSX manager appliances, please ensure you have up to date backups in place.
Before rebooting the NSX-T manager appliances, ensure the cluster is healthy by running the following command:
get cluster status
If the cluster is up and healthy, proceed to the next NSX-T manager.