NSX DFW Publish tasks fail with "Firewall provisioning failed on Host host-XXXX for reason XXXXXX:failed to parse config"
search cancel

NSX DFW Publish tasks fail with "Firewall provisioning failed on Host host-XXXX for reason XXXXXX:failed to parse config"

book

Article ID: 318842

calendar_today

Updated On:

Products

VMware NSX for vSphere

Issue/Introduction

  • You have DFW rules created on the "Partner Services" section which contain multiple services under a single rule.

  • You restart the the vShield-Stateful-Firewall service, and now firewall publish tasks succeed. 

  • You see error messages similar to the following: 

vmkernel.log

2018-04-12T18:39:14.262Z cpu4:8432502)WARNING: Heap: 3867: Could not allocate 299008 bytes for dynamic heap vsiHeap.8432502. Request returned Admission check failed for memory resource
2018-04-12T18:39:34.282Z cpu4:8498110)WARNING: Heap: 3867: Could not allocate 4096 bytes for dynamic heap worldGroup.8498111. Request returned Admission check failed for memory resource
2018-04-12T18:39:34.282Z cpu4:8498110)WARNING: Heap: 3867: Could not allocate 4096 bytes for dynamic heap worldGroup.8498111. Request returned Admission check failed for memory resource
 

vsm.log

2018-06-07 16:08:51.859 CDT INFO taskScheduler-10 EventBsdFtrMgrImpl:284 - Transactionally updated resource: host-XXXXX, with feature status: [resourceId : null, featureId : com.vmware.vshield.firewall, featureVersion : 5.5, status : YELLOW, installed : true, errorStatus : ]
2018-06-07 16:08:51.893 CDT INFO taskScheduler-10 EventBsdFtrMgrImpl:284 - Transactionally updated resource: host-XXXXX, with feature status: [resourceId : null, featureId : com.vmware.vshield.firewall, featureVersion : 5.5, status : YELLOW, installed : true, errorStatus : ]


Environment

VMware NSX for vSphere 6.0.x
VMware NSX for vSphere 6.2.x
VMware NSX for vSphere 6.4.x
VMware NSX for vSphere 6.3.x

Resolution

This issue occurs when there are multiple services defined in a single Partner Services Distributed Firewall rule AND the rule has multiple security groups in its source or destination. If a publish task is attempted, the new ruleset will be pushed, however the overall memory consumption of vsfwd on esxi will increase. The memory steadily increases with each subsequent firewall publish task until the memory cannot be increased, thus resulting in a publish failure.

This is caused by a small memory leak which will increase the overall memory consumption of the vsfwd process on the ESXi host. Restarting the Firewall service on ESXi will clear vsfwd process memory and set the overall consumption back to a stable level.


This issue will be resolved with the release of NSX 6.4.2

Workarounds: 

Restart the firewall service on esxi when a firewall publish failure is observed.
# /etc/init.d/vShield-Stateful-Firewall restart

Redesign the Partner Services firewall rules so that no single rule has multiple services configured.