Loggregator continuously generates hundreds of DNS requests per second to external DNS servers
search cancel

Loggregator continuously generates hundreds of DNS requests per second to external DNS servers

book

Article ID: 297861

calendar_today

Updated On:

Products

VMware Tanzu Application Service for VMs

Issue/Introduction

VMware Tanzu Application Service for VMs (TAS) 2.8.5 uses Loggregator Version 106.2.4 which uses an upgraded version of GOLANG GRPC. By default GRPC will first attempt to lookup up the following hostname via Bosh DNS for A, SRV, and TXT records:
"q-s0.doppler.<NETWORK NAME>.cf-GUID.bosh"

If there is a failure or delay GRPC will also sends DNS lookups to all of the external DNS servers listed in /etc/resolv.conf which can be hundreds of requests generated per second.

Sample log message from BOSH DNS server that repeats thousands of times:
[RequestLoggerHandler] 2020/03/27 12:06:04 WARN - handlers.DiscoveryHandler Request qtype=[TXT] qname=[q-s0.doppler.pas.cf-GUID.bosh.] rcode=SERVFAIL time=17000ns


Environment

OS: 2.8

Resolution

The fix for this issue is in TAS for VMs 2.8.6 and TAS for VMs 2.9.0

If your external DNS servers are overwhelmed by the DNS traffic generated by the loggregator instances, you can workaround this issue with one of the following methods until you are able to upgrade.

Method 1

Add a firewall rule to each of the loggregator_trafficcontroller VMs instances that drops all outbound DNS queries specifically for q-s0.doppler.x domain lookups. Loggregator will still be able to query bosh dns but all lookups to the external DNS servers will be dropped by this iptables rule. The caveat to this method is your traffic controller VM may show higher than normal CPU usage.
 
1. ssh to each of the loggregator_trafficcontroller VMs and sudo to root
sudo su -
2. Add this rule live on the system. If you make a mistake adding this rule and need to remove it you can do so by simply by repeating the same iptables command you previously ran, but changing argument "-I OUTPUT" to "-D OUTPUT". 
/sbin/iptables -I OUTPUT -o eth0 -p udp --dport 53 -m string --hex-string "|04|q-s0|07|doppler|" --algo bm -j DROP
3. Verify packets are getting dropped with "iptables -vL OUTPUT" command. You should see numbers increasing in the first two columns called "pkts" and "bytes"
loggregator_trafficcontroller/cba5263d-a796-43bd-abeb-210acdd9fd01:~# iptables -vL OUTPUT
Chain OUTPUT (policy ACCEPT 84M packets, 9470M bytes)
 pkts bytes target     prot opt in     out     source               destination
  28M 2800M DROP       udp  --  any    eth0    anywhere             anywhere             udp dpt:domain STRING match  "|04712d733007646f70706c6572|" ALGO name bm TO 65535
4. To make the setting permanent save iptables to a config file:
iptables-save > /etc/iptables.conf
5. Edit the /etc/rc.local file so it restores the iptables rules during boot up. Make sure to add the line "iptables-restore < /etc/iptables.conf" just before the exit 0 statement. Below is an example of /etc/rc.local with the changes.
#!/bin/sh -e
#execute firstboot.sh only once
if [ ! -e /root/firstboot_done ]; then
    if [ -e /root/firstboot.sh ]; then
        /root/firstboot.sh
    fi
    touch /root/firstboot_done
fi
iptables-restore < /etc/iptables.conf
exit 0


Method 2

Stop all loggregator jobs on each of the loggregator_trafficcontroller VMs. This method will effectively disable logging within the TAS platform which could impact components like Apps Manager, cf logs command, and any installed firehose nozzle.
  • ssh into each loggregator_trafficcontroller VM
  • monit stop loggregator_trafficcontroller