DNS Connectivity Issues involving NSX Firewall
search cancel

DNS Connectivity Issues involving NSX Firewall

book

Article ID: 405889

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • The focus of this article is a connectivity issues with a firewall rules.
  • A function or application must be able to resolve FQDN and derive an IP Address to complete tasks successfully.
  • The ability to resolve FQDN to IP addresses has been working and suddenly fails.
  • The application may report that it cannot resolve the FQDN.
  • The application may report that it cannot reach the DNS server.
  • There may be several DNS servers available and only some will respond.
  • Other network nodes are able to use the DNS servers successfully.

Environment

NSX

Cause

  • The distributed firewall has been modified by adding or editing firewall rules.
  • NSX distributed firewall is enabled.

Resolution

  1. Prepare the test Virtual Machine (VM) for testing

    Chose a test virtual machine suffering the issue and open a command line console.
    Attempt to perform an nslookup of some known FQDN that is expected to be resolvable by DNS.
     for i in $(seq 11 10000); do nslookup google.com;sleep 2; clear; date ; done
    This will generate DNS requests every 2 seconds 10000 times.
    Edit the /etc/resolv.conf  file to use only the IP address of the suspected DNS server.
    vi /etc/resolv.conf

    For testing only have a single entry. 
    "#" in front of an entry will turn the line into a  remark.  This can be used if multiple DNS entries are configured to isolate them individually.


  2. Packet Capture to Identify and Characterize DNS Traffic
    Identify which ESXi host the test VM is located on and open an SSH session to it.
    Collect the switchport and vmnic information that will be needed for the pktcap-uw command.
    esxcli vm process list|grep -A1 <VM Name>
    esxcli network vm port list -w <World Id>



  3. Verify that the DNS request is arriving at the vDS switchport
    pktcap-uw --switchport <Switchport Number> --capture VnicTx,VnicRx --ng -c [1 ... 500] --ip <IP Address DNS Server> -o - |tcpdump-uw -enr -

    The above screen shot verifies that DNS packets are reaching the vDS.


  4. Verify that DNS traffic is leaving the vDS.
    pktcap-uw --uplink <vmnicX> --capture UplinkSndKernel,UplinkRcvKernel --ng -c [1 ... 500] --ip <IP Address DNS Server> -o - |tcpdump-uw -enr -

    At this point it appears that the DNS packets are no crossing the vDS.  The next capture will find if the issue is  with a firewall rule.


  5. Verify where in the vDS the packet stops
     pktcap-uw --trace  -c 50 --ip  <DNS IP>

    The output displays each step of the way through the stack that the packet must traverse.  The point before PktFree is the last point in the stack that acted on the packet.
    The example output shows that the last point to act on the packet was PreDVFilter.   PreDVFilter and PostDVFilter comprise the distributed firewall. 
    This example suggests that the packet was dropped by the firewall.


  6. Verify what firewall rule is being applied.
    summarize-dvfilter |grep -A3 " vmm0:<VM Name>"



  7. Detect Drops
    vsipioctl getflows -f <nic-########-ethX-vmware.sfw.2> -t 2 | grep "<DNS Ip>:domain(53)"
    The firewall rule set nic-############-ethX-vmware-sfw-2 is dropping packets.  Packet that are not dropped will state Active rather than Drop.
    Rule 2 is not allowing the packet to pass in this example.


  8. Display the rules
    vsipioctl getfwconfig -f <nic-########-ethX-vmware.sfw.2>
    The above is the output showing the firewall rules and other configurations. Rule 1000005 states ANY source can access Only destinations listed in the addrset over a list of defined ports.  DNS should pass given that port 53 is listed.


  9. Search for the addrset UUID listed in the rule 1000005 description.  
    The list of IP addresses in this list are the only destination IPs  the rule is configured to use.
    If the destination IP for a packet is not listed here, the packet will be dropped according to the examplerule 100005.


  • The resolution for this is to have the NSX firewall administrators evaluate the rules and correct them to function per their needs.
  • The system is working correctly per design and current configurations.

Additional Information