VMs loose network connectivity in a cluster prepared for NSX Guest Introspection and Third Party Antivirus Solution
search cancel

VMs loose network connectivity in a cluster prepared for NSX Guest Introspection and Third Party Antivirus Solution

book

Article ID: 343368

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • VMs loose network connectivity as part of NSX.
  • The host has been prepared for Guest introspection and Trend Micro SVM as part of the antivirus soluton
  • In the vmkernel.log of the affected hosts, you see events related to the lack of connectivity to the third party SVM:

2017-08-03T11:16:49.539Z cpu8:33627)WARNING: DVFilter: 1181: Couldn't enable keepalive: Not supported
2017-08-03T11:16:49.541Z cpu25:24589764)WARNING: DVFilter: 2440: World DVFilter-Async-####### is enabled.
2017-08-03T11:16:49.541Z cpu25:24589764)Net: 2441: connected coalesce port ##### eth-1 to DVFilter Coalesce Portset, portID 0x1#####c
2017-08-03T11:16:49.541Z cpu25:24589764)DVFilter: 2699: DVFilterShmSetupCoalescePort get port ########
2017-08-03T11:16:49.541Z cpu25:24589764)DVFilter: 2886: DVFilterShm coalescing is enabled.
2017-08-03T11:16:51.543Z cpu25:33382)vsip VSIPDVFProcessPackets:2945: Faulting packets to slowpath failed : No connection
2017-08-03T11:16:51.543Z cpu25:33382)vsip VSIPDVFProcessPackets:2945: Faulting packets to slowpath failed : No connection
2017-08-03T11:16:51.546Z cpu25:33382)vsip VSIPDVFProcessPackets:2945: Faulting packets to slowpath failed : No connection

  • In the file syslog.log you also see events showing the lack of connectivity to the Third Party SVM (169.254.1.39 in this case):

2017-08-03T11:16:59Z EPSecMux[11073110]: [ERROR] (EPSEC) [11073110] Attempted to recv 4 bytes from sd 77, errno = 104 (Connection reset by peer)
2017-08-03T11:16:59Z EPSecMux[11073110]: [ERROR] (EPSEC) [11073110] [0x65808b90] Error on socket to solution 169.#.#.39:4###1: SocketError on sd 77, in recv: Connection reset by peer (104)
2017-08-03T11:16:59Z EPSecMux[11073110]: [ERROR] (EPSEC) [11073110] Attempted to recv 4 bytes from sd 47, errno = 104 (Connection reset by peer)
2017-08-03T11:16:59Z EPSecMux[11073110]: [ERROR] (EPSEC) [11073110] [0x65905a58] Error on socket to solution 169.#.#.39:4###1: SocketError on sd 47, in recv: Connection reset by peer (104)
2017-08-03T11:16:59Z EPSecMux[11073110]: [ERROR] (EPSEC) [11073110] Attempted to recv 4 bytes from sd 38, errno = 104 (Connection reset by peer)
2017-08-03T11:16:59Z EPSecMux[11073110]: [ERROR] (EPSEC) [11073110] [0x65915068] Error on socket to solution 169.#.#.39:4###1: SocketError on sd 38, in recv: Connection reset by peer (104)
2017-08-03T11:18:01Z crond[33919]: crond: USER root pid 3######3 cmd /usr/lib/vmware/netcpa/monitor/netcpa-monitor.sh

  • In the logs of the third party VM (Trend Micro in this case), you see the firewall service starting:

2017-08-03 11:16:49.236975: [Appl/5] | Added service 2000:dsa.ListenThread for domain | ...ld_DSA_10_Rhel6x64/src/dsa/core/scripts/dsa/DomUtils.lua:792:InsertService | 59E:7F555E9E9700:CScriptThread
--- > 2017-08-03 11:16:49.237031: [Appl/5] | StartServices() | ...ld_DSA_10_Rhel6x64/src/dsa/core/scripts/dsa/DomUtils.lua:1026:(null) | 59E:7F555E9E9700:CScriptThread
2017-08-03 11:16:49.248863: [Appl/5] | Added service 3002:dsp.fwdpi.service for domain | ...ld_DSA_10_Rhel6x64/src/dsa/core/scripts/dsa/DomUtils.lua:792:InsertService | 59E:7F555E9E9700:CScriptThread
2017-08-03 11:16:49.248963: [Appl/5] | Added service 3003:dsp.fwdpi.sslDpiService for domain | ...ld_DSA_10_Rhel6x64/src/dsa/core/scripts/dsa/DomUtils.lua:792:InsertService | 59E:7F555E9E9700:CScriptThread
2017-08-03 11:16:49.272434: [Appl/5] | Added service 3004:dsp.wrs.dsvaservice for domain | ...ld_DSA_10_Rhel6x64/src/dsa/core/scripts/dsa/DomUtils.lua:792:InsertService | 59E:7F555E9E9700:CScriptThread

 

 

Environment

VMware NSX for vSphere 6.4.x
VMware NSX for vSphere 6.0.x
VMware NSX for vSphere 6.3.x
VMware NSX for vSphere 6.1.x
VMware NSX for vSphere 6.2.x

Cause

The lack of connectivity occurred because of a memory “max out” of the DSVAs, failopen wasn’t applied because in theory the DSVA was up and didn’t match any of the following conditions:
 
  • Deleted and re-deployed, or
  • Powered down, or
  • ds_agent service is unavailable.

Resolution

Trend Micro Technical Support found out that their failopen is not triggered when the DSVA has an internal memory dump, their recommendation was to increase RAM to 8 GB

Please validate with Trend Micro before increasing the amount of memory in the SVM. Also please find the appropriate documentation for your version of Deep Security.