When DNS response goes through SD-WAN edge, it intercepts the DNS A/AAAA records in the DNS response and cache them into edge's DNS cache. Additionally, edge also learns hostname-IP mapping through DPI and insert them into DNS cache. Thus DPI entry and DNS cache share the DNS cache space. DNS cache limit is hardcoded based on the device's memory. Sometimes customer may find "DNS cache max limit" event and may want to know how DNS cache is aging in the cache space. This article introduces how it works.
VMware VeloCloud SD-WAN edge
When DNS response goes through SD-WAN edge, it intercepts the DNS A/AAAA records in the DNS response and cache them into edge's DNS cache. Edge also honor the TTL in the DNS response:
edge:b2-edge1:~# debug.py --dns_name_cache
Total Cache Entries: 1
NAME ADDRESS TTL(s) SOURCE
.www.zhongyu.com 172.16.100.2 300 DNS
Once the cache entry is inserted, TTL decreases by 1 per second. When TTL reaches 0, SD-WAN edge does not immediately delete the entry, but keep decreasing TTL to -1, -2 etc:
Total Cache Entries: 1
NAME ADDRESS TTL(s) SOURCE
.www.zhongyu.com 172.16.100.2 -1 DNS
Total Cache Entries: 1
NAME ADDRESS TTL(s) SOURCE
.www.zhongyu.com 172.16.100.2 -2 DNS
Total Cache Entries: 1
NAME ADDRESS TTL(s) SOURCE
.www.zhongyu.com 172.16.100.2 -4 DNS
Total Cache Entries: 1
NAME ADDRESS TTL(s) SOURCE
.www.zhongyu.com 172.16.100.2 -5 DNS
Total Cache Entries: 1
NAME ADDRESS TTL(s) SOURCE
.www.zhongyu.com 172.16.100.2 -6 DNS
Total Cache Entries: 1
NAME ADDRESS TTL(s) SOURCE
.www.zhongyu.com 172.16.100.2 -8 DNS
Then how SD-WAN edge clean the cache entry? There is a periodic DNS cache cleanup timer with default value 600s, once timer expires, SD-WAN edge cleans all the cache entries with negative TTL. Thus actual survival time of a cache entry is TTL+random(0-599). For example, when a cache entry is inserted with initial TTL=300s, the actual survival time of this cache entry is 300-899 seconds.
Total Cache Entries: 1
NAME ADDRESS TTL(s) SOURCE
.www.zhongyu.com 172.16.100.2 -584 DNS
Total Cache Entries: 1
NAME ADDRESS TTL(s) SOURCE
.www.zhongyu.com 172.16.100.2 -585 DNS
Total Cache Entries: 1
NAME ADDRESS TTL(s) SOURCE
.www.zhongyu.com 172.16.100.2 -586 DNS
Total Cache Entries: 1
NAME ADDRESS TTL(s) SOURCE
.www.zhongyu.com 172.16.100.2 -588 DNS <----cleanup timer expires at this time
Total Cache Entries: 0 <----cache entry with negative TTL is deleted
NAME ADDRESS TTL(s) SOURCE
Below is a brief summary:
1. When a cache entry's TTL becomes negative value, it still works for hostname-based business policy.
2. When a SD-WAN edge learns same hostname-IP mapping, edge refreshes the TTL immediately, no matter the previous cache entry's TTL was positive or negative value:
Total Cache Entries: 1
NAME ADDRESS TTL(s) SOURCE
.www.zhongyu.com 172.16.100.2 -174 DNS
Total Cache Entries: 1
NAME ADDRESS TTL(s) SOURCE
.www.zhongyu.com 172.16.100.2 -176 DNS
Total Cache Entries: 1
NAME ADDRESS TTL(s) SOURCE
.www.zhongyu.com 172.16.100.2 -177 DNS
Total Cache Entries: 1
NAME ADDRESS TTL(s) SOURCE
.www.zhongyu.com 172.16.100.2 -178 DNS
Total Cache Entries: 1
NAME ADDRESS TTL(s) SOURCE
.www.zhongyu.com 172.16.100.2 298 DNS <----Refresh TTL
Total Cache Entries: 1
NAME ADDRESS TTL(s) SOURCE
.www.zhongyu.com 172.16.100.2 297 DNS
3. When a hostname is mapping to multiple IPs, when the edge learns same hostname-IP mapping, it only refresh the TTL of that specific IP.
4. DPI learned cache entry's TTL is 86400s by default. Aging process is same with DNS sourced entries.
Customer can manually flush the DNS cache by command:
debug.py --dns_ip_cache_flush
Or do it via remote diagnostics:
VMware VeloCloud SD-WAN has enhanced DNS cache management since R452-20240125-GA, tracked by bug#126520. When the DNS cache is full, the Edge will now reclaim the least recently used (LRU) entry, as long as the entry hasn't been used in the last 5 minutes, to allow room for the new incoming entry. This enhancement increases the success rate of hostname-based business policies.