Troubleshooting Network and Security Service Insertion in NSX for vSphere 6.x
search cancel

Troubleshooting Network and Security Service Insertion in NSX for vSphere 6.x

book

Article ID: 343369

calendar_today

Updated On:

Products

VMware NSX Networking

Issue/Introduction

It is a common practice to deploy third-party networking and security services into their network along with VMware NSX for vSphere. Service profiles are created in NSX Manager and then sent to a prepared cluster, where they are bound to various constructs such as a vDS port group, a virtual wire or a security group.
 
The purpose of this article is to provide troubleshooting information for installation and configuration issues with the Network Extensibility Program (NetX) platform service on which the third-party partner services are built.
 
Deployment of services on ESXi hosts in the cluster
 
Network and security services are deployed as a virtual machine tap point or profile filter on all virtual machine vNICs in the security group, or attached to vDS port groups where a service profile is bound.
  1. NSX Manager sends the service instance, service profile, and service profile rules to VSFWD, which runs as a user world process on all ESXi hosts in a prepared cluster.
  2. The VSFWD process configures the VSIP kernel module.

    Note: Before configuring a third-party service, prepare the ESXi host cluster. For more information, see the Prepare New Hosts and Clusters to Work with NSX section in the NSX Installation and Upgrade Guide.
 
NetX Architecture


 
3rd party security solutions deployment
 
 



Environment

VMware NSX for vSphere 6.2.x
VMware NSX for vSphere 6.4.x
VMware NSX for vSphere 6.0.x
VMware NSX for vSphere 6.1.x
VMware NSX for vSphere 6.3.x

Resolution

Installation, certification and registration requirements for third-party services

Third-party services must meet these requirements:
  1. Must be certified as supported with at least vSphere 5.5. To check compatibility of a service, check the Networking and Security section of the VMware Compatibility Guide and filter the API Integration by VMware Network Extensibility (NetX).
  2. Must support VMware VM hardware version 8 or later on the service virtual machine (SVM) in order to use the Virtual Machine Communication Interface (VMCI).
  3. Must provide a management solution which communicates with NSX Manager using REST APIs and which is packaged as an OVF for download and installation.
When installing host-based services, if you choose the Specified on the host option in Agent VM settings, ensure that the Agent virtual machine settings are indeed correctly specified for all the hosts in the cluster. Check Agent virtual machine settings in the vSphere Web Client at Host > Configuration > Agent VM settings > Select datastore and network.

Service registration, deployment and troubleshooting using REST APIs

To deploy and register a third-party service with NSX Manager, see the Integrating Third Party Services section of the NSX Administration Guide.
 
For more information about NSX for vSphere administration using REST APIs, see the Using the NSX REST API section of the NSX vSphere API Guide.
 
Service registration
  1. Deployment is done through Service Manager, which represents the third-party service's centralized management tool. The third-party service manager is registered with NSX Manager using this REST API POST to /api/2.0/si/servicemanager:


    /api/2.0/si/service:


    service-id

     
  2. To show all service instances:

    GET /api/2.0/si/serviceinstances
     
  3. To show a specific service instance:

    GET /api/2.0/si/serviceinstance/service-inst-id
     
  4. To show all service profiles:

    GET /api/2.0/si/serviceprofiles
     
  5. To show a specific service profile

    GET /api/2.0/si/serviceprofile/service-profile-id
     
  6. To show all rules associated with a specific service profile:

    GET /api/2.0/si/serviceprofile/service-profile-id servicepolicy

Deployment verification and troubleshooting using the Central CLI

Note: In the following output, vmware-sfw refers to the VMware firewall and is located in slot 2, whereas serviceinstance-1 refers to a third-party firewall, such as the Palo Alto Networks firewall, and is populated in slot 4.

  1. Log in to the NSX Manager with the admin credentials.
  2. To display a summary of DVFilter information, run this command show dfw host-id summarize-dvfilter.

    For example:

    show dfw host host-28 summarize-dvfilter

    Fastpaths:
    agent: dvfilter-faulter, refCount: 1, rev: 0x1010000, apiRev: 0x1010000, module: dvfilter
    agent: dvfilter-generic-vmware, refCount: 1, rev: 0x1010000, apiRev: 0x1010000, module: dvfilter-generic-fastpath
    agent: dvfg-igmp, refCount: 1, rev: 0x1010000, apiRev: 0x1010000, module: dvfg-igmp
    agent: dvfilter-generic-vmware-swsec, refCount: 1, rev: 0x1010000, apiRev: 0x1010000, module: dvfilter-switch-security
    agent: bridgelearningfilter, refCount: 1, rev: 0x1010000, apiRev: 0x1010000, module: vdrb
    agent: vmware-sfw, refCount: 2, rev: 0x1010000, apiRev: 0x1010000, module: vsip
    agent: serviceinstance-1, refCount: 3, rev: 0x1010000, apiRev: 0x1010000, module: vsip

    Slowpaths:
    slowPath: 4, agent serviceinstance-1, refCount: 2, rev: 0x4, apiRev: 0x4, capabilities: csum

    Filters:
    world 3222538 vmm0:win-xp-64-pro-sp2-133 vcUuid:'50 3f e3 37 83 6f 5a fd-60 7a 69 0a 9c 3d d3 1f'
    port 50331655 win-xp-64-pro-sp2-133.eth0
    vNic slot 2
    name: nic-3222538-eth0-vmware-sfw.2
    agentName: vmware-sfw
    state: IOChain Attached
    vmState: Detached
    failurePolicy: failClosed
    slowPathID: none
    filter source: VMX File
    vNic slot 4
    name: nic-3222538-eth0-serviceinstance-1.4
    agentName: serviceinstance-1
    state: IOChain Attached
    vmState: Attached
    failurePolicy: failOpen
    slowPathID: 4
    filter source: Dynamic Filter Creation

Rule troubleshooting - rule existence

  1. Log in to the NSX Manager with the admin credentials.
  2. To display detailed information about a vnic, run this command show dfw host host-id vnic.

    For example:

    show dfw host host-34 vnic

    501f02cc-9078-22ad-0186-4a05d7f18978.000
    Datacenter: DC-1
    Cluster: cluster 1
    Host: dfw-host-cake
    VM: VM-A-03

    Vnic Name : rhel-5-32-svr-ovf - Network adapter 1
    Vnic ID : 501f02cc-9078-22ad-0186-4a05d7f18978.000
    MacAddress : 00:50:56:9f:64:d3
    PortGroupId: network-21
    Filters :
    nic-311161-eth0-vmware-sfw.2
    nic-311161-eth0-serviceinstance-1.4
    nic-311161-eth0-serviceinstance-2.5


    Note: This command also list the filters.
     
  3. To display the rules configured on the filter, run this command show dfw host host-id vnic vnic-id filter filter-name rules

    For example:

    show dfw host host-34 vnic 501f02cc-9078-22ad-0186-4a05d7f18978.000 filter nic-311161-eth0-vmware-sfw.2 rules

    Rules:
    ruleset domain-c7 {
    # Filter rules
    rule 1005 at 1 both protocol tcp from addrset ip-ipset-1 to addrset dst1005 port 80 accept;
    rule 1004 at 1 inout protocol icmpv6 icmptype 135 from any to any accept;
    rule 1004 at 2 inout protocol icmpv6 icmptype 136 from any to any accept;
    rule 1003 at 3 inout protocol udp from any to any port 68 accept;
    rule 1003 at 4 inout protocol udp from any to any port 67 accept;
    rule 1002 at 5 inout protocol any from any to any accept;
    }

    ruleset domain-c7_L2 {
    # Filter rules
    rule 1001 at 1 inout ethertype any from any to any accept;
    }

Rule Troubleshooting - Container contents

  1. Log in to the NSX Manager with the admin credentials.
  2. To display the addrsets configured on the filter, run this command show dfw host host-id vnic vnic-id filter filter-name addrsets

    For example:

    show dfw host host-34 vnic 501f02cc-9078-22ad-0186-4a05d7f18978.000 filter nic-311161-eth0-vmware-sfw.2 addrsets

    Addrsets:
    addrset ip-ipset-107 {
    ip 61.0.2.6,
    ip 69.0.2.6,
    }
    addrset ip-ipset-108 {
    ip 67.0.2.7,
    ip 69.0.2.7,
    }
    addrset ip-securitygroup-12 {
    ip 10.24.227.85,
    ip fc00:10:24:227:250:56ff:fe98:6c3d,
    ip fe80::250:56ff:fe98:cbf8,
    }
    addrset mac-macset-125 {
    mac 00:55:56:00:01:d6,
    mac 00:55:56:00:01:d7,
    mac 00:55:56:00:01:d8,
    }
    addrset src1428 {
    ip 10.24.226.7,
    ip 10.24.226.62,
    ip fc00:10:24:227:250:56ff:fe98:1596,
    ip fe80::250:56ff:fe98:3533,
    ip fe80::250:56ff:fe98:733c,
    }

Traffic Flow Troubleshooting

  1. Log in to the NSX Manager with the admin credentials.
  2. To display the rule stats on the filter, run this command show dfw host host-id vnic vnic-id filter filter-name stats

    For example:

    show dfw host host-34 vnic 501f02cc-9078-22ad-0186-4a05d7f18978.000 filter nic-311161-eth0-vmware-sfw.2 stats

    Stats:
    rule 2036: 845 evals, in 0 out 0 pkts, in 0 out 0 bytes
    rule 1428: 845 evals, in 0 out 0 pkts, in 0 out 0 bytes
    rule 1004: 845 evals, in 18 out 0 pkts, in 1152 out 0 bytes
    rule 1004: 672 evals, in 18 out 0 pkts, in 1296 out 0 bytes
    rule 1003: 823 evals, in 252 out 0 pkts, in 83420 out 0 bytes
    rule 1003: 131 evals, in 0 out 0 pkts, in 0 out 0 bytes
    rule 1002: 785 evals, in 4646 out 0 pkts, in 3068616 out 0 bytes
    rule 1001: 310 evals, in 253030 out 0 pkts, in 15596640 out 0 bytes

     
  3. To display the flow data on the filter, turn on/off flow for the filter, run this command show dfw host host-id vnic vnic-id filter filter-name flows

    For example:

    show dfw host host-34 vnic 501f02cc-9078-22ad-0186-4a05d7f18978.000 filter nic-311161-eth0-vmware-sfw.2 flows

    Flows:
    Count retrieved from kernel active(L3,L4)=1,
    active(L2)+inactive(L3,L4)=0, drop(L2,L3,L4)=0
    531135a500000000 Active TCP 0800 2 1 1 0 10.24.106.75:Unknown(57514)
    10.116.90.39:ssh(22) 210 EST 6012 9754 47 44


    For more information, see the NSX Central Commands section of the NSX Command Line Interface Reference Guide.

 

Common misconfiguration issues
  • The Network Fabric Service is not installed prior to deployment of the third-party service
  • For host based services, during installation if you choose the Specified on the host option in Agent VM settings, ensure that the Agent virtual machine settings are indeed correctly specified for all the hosts in the cluster.. Check Agent virtual machine settings in the vSphere Web Client at Host > Configuration > Agent VM settings > Select datastore and network.

Troubleshooting with traceflow

Traceflow provides a way to inject packets into a DVS port and allow various “observation points” along the packet’s path to report observations of the packet as it traverses the logical network. Traceflow illustrates the path(s) that a packet takes through the logical or physical network and where a packet is being dropped.
 
NSX for vSphere 6.2.4 extends traceflow visibility through third-party NetX services. Traceflow support enhances fault isolation by helping to identify whether packet drops are happening at the third-party NetX service VM or within the NetX framework itself. The extension adds observation points while packets are redirected to the SVM and injected back from the SVM.

When a traceflow session is initiated, a unique traceflow id is generated to track the session. A dummy packet, carrying the traceflow id, is injected from the source VM > vnic . As the packet progresses through the data path, unique observations are generated and transferred to the controller. The controller sends these observations to the Manager, where it gets displayed on the UI.
 
Traceflow packets are initiated and displayed from the Tools > Traceflow screen, as shown below.

 
An observation point is added at the Netx filter, where the packet is punted to the SVM. At this point, the received observation is generated. When the SVM injects the packet back into the hypervisor kernel, the forwarded observation is generated. If the SVM does not inject the packet back into kernel before an internal timer expires, a dropped observation is generated.
 
The observation details provides:
  • NetX rule ID
  • Firewall rule ID
  • Service profile name
  • Failure policy ( if the packet hits the failure policy)

Traceflow Support Requirements

  • Traceflow is only supported for VNICs that are attached to logical switch ports and not logical router ports.
  • In order for Traceflow to work, there must be a working controller cluster for the NSX Manager connected to the vCenter Server where the source vNIC is located.

    Note: While the source vNIC must meet the conditions described, the destination vNIC does not have the same constraints. However, the packet can only be traced until it exits the NSX network.
     
  • The target of the Traceflow should be a virtual machine managed by the same vCenter Server instance. If there are multiple vCenter Server instances, the Traceflow operation must be initiated on the vCenter Server that manages the source virtual machine. If the target virtual machine is managed by another vCenter Server, the user will have to provide the virtual machine's IP address and MAC address.

Collecting Diagnostic Information


Using the NSX Central CLI commands

Starting with NSX 6.2.4, a Central CLI command has been introduced for exporting NSX related outputs and files in the ESXi host support bundle to a specified server. This command collects the following information:
  • vmkernel and vsfwd log files
  • list of filters
  • list of DFW rules
  • list of containers
  • spoofguard details
  • host-related information
  • ipdiscovery related info
  • rmq command outputs
  • security group and services profile and instance details
  • esxcli related outputs

To export NSX-related diagnostic information via the Central CLI:

  1. Log in to the NSX Manager using the admin credentials.
  2. Run this command:

    export host-tech-support host-id scp uid@ip:/path

    Notes:
    • This command generates the NSX tech-support bundle and copies it to a specified server.
    • It removes any temporary files on the NSX Manager.
    • Run the show cluster all command to get host-id information.

    ESXi Host Command Details

    nsx-support
     
    • Usage: /bin/nsx-support {-h|start|getstatus|cleanup} []
    • Command outputs with different command arguments
    • nsx-support start []
    • If the command arguments are normal, it returns “In progress”
    • If nsx-support start is not proper, e.g. nsx-support start abc, the output displays Path does not exist: /vmfs/volumes/abc. Please specify output datastore name.

    nsx-support getstatus
     
    • If there is an available log bundle, it returns the absolute directory of the bundle in the datastore. e.g. /vmfs/volumes/”{datastoreName}"/esx-prmh-nsx-dfw-dhcp-78-123.eng.vmware.com-2015-11-17--19.35.tgz.
    • Otherwise, it returns with an error: No NSX tech support bundle found.

    nsx-support delete
     
    • Returns Done.
Using the user interface
 
Diagnostic information on third party services can be collected from NSX Manager and the ESXi hosts.
Using the command line
 
Alternatively, diagnostic information can be collected directly from the ESXi hosts using the Command Line Interface.
  • /var/log/vmkernel.log - VSIP kernel module logs on the ESXi host. VSIP log messages are prefixed with vsip .
     
  • /var/log/vmware/vpx/eam.log - installation troubleshooting information in the ESX Agent Manager (EAM) log. 
     
  • /var/log/vsfwd.log - VSFWD user-world logs on the ESXi hosts
     
  • /home/secureall/secureall/logs/vsm.log - NSX Manager logs
  • /var/log/dfwpktlogs.log - distributed firewall logs, useful to assess whether traffic is being punted to any deployed third-party service VM.

 

 

 

If a punt rule is configured, search for similar messages:
2015-10-26T23:26:53.184Z INET match PUNT 1002/4792 IN 52 UDP from-SVM 10.250.51.110/49757->10.250.50.212/53
2015-10-26T23:26:53.189Z INET match PUNT 1002/4792 IN 63 UDP to-SVM 10.250.51.110/58320->10.250.50.212/53



Additional Information