While the API Gateway application itself does not have a DNS cache, the Java platform it is written on top of does have a DNS cache property to avoid frequent lookups for hostnames and to improve performance. Unfortunately, a DNS cache can come with some unintended consequences, such as sending requests to an old IP address if an IP address changes at some point in time with the issue not correcting until the cache has expired. This can sometimes cause service outages as a result as the Gateway continues to send requests to an old IP address. This occurs frequently when AWS (Amazon Web Services) is hosting a backend service, as they utilize what they call "elastic IP addresses" that change from time to time without notice.
The Java DNS TTL (Time To Live) can be changed/modified to meet the requirements of the environment. In some cases, a lower value will be preferred, and in others a higher value. There is no right or wrong value to use, it solely depends on various factors such as the criticality of the services involved, how frequent this type of IP change takes place, network performance, and more.
This article applies to anyone running the API Gateway (any version) who wants to fine-tune the cache TTL for DNS lookups. This is more likely needed in versions prior to 9.3 CR05 when the default was "0", but may still be applied to later versions to better accommodate the nuances of the network and rest of the environment.
This article was originally written to resolve an issue that could occur under certain circumstances and was caused by a default TTL of "0" in JDK which meant an indefinite cache TTL. We later resolved this by overwriting the JDK default with a TTL of "30" starting with version 9.3 CR05 and later.
To change the DNS TTL / cache expiry time for successful DNS lookups, the following steps should be followed:
- Add the following line to the /opt/SecureSpan/Gateway/runtime/etc/profile.d/ssgruntimedefs.sh file:
- The value entered for the TTL is in seconds, so the example above shows a value of 30 seconds for the cache lifetime before it expires and regenerates.
- If the cache is desired to be set to never expire, a value of -1 can be used to represent infinite/unlimited. This is the default behaviour in Java.
- From the Java documentation regarding the value: "The value is specified as integer to indicate the number of seconds to cache the successful lookup."
- The opposite of the above can also be configured, meaning this next value below can set how long to cache a failed DNS lookup entry, however this would not normally be necessary to configure/change from the default value:
- From the Java documentation regarding the value of the negative TTL: "The value is specified as integer to indicate the number of seconds to cache the failure for un-successful lookup."
- Restart the Gateway service for the changes made to the file above to take effect: service ssg restart
Example of the new ADD:
default_java_opts="$default_java_opts -Dfile.encoding=UTF-8 "
default_java_opts="$default_java_opts -Djava.awt.headless=true -XX:CompileThreshold=1500 "
default_java_opts="$default_java_opts -Dsun.net.inetaddr.ttl=30 "