Invocation exception caused by: java.net.UnknownHostException
search cancel

Invocation exception caused by: java.net.UnknownHostException

book

Article ID: 414617

calendar_today

Updated On:

Products

VMware Aria Suite

Issue/Introduction

  • Issue where REST API calls are failing from Aria Orchestrator periodically with DNS related issues
  • We might also see numerous restarts in "kube-system" pods such as:
    • $ /usr/local/bin/kubectl get all,replicationcontrollers,events --show-kind --all-namespaces --output wide
      stdout:
      NAMESPACE     NAME                                              READY   STATUS      RESTARTS       AGE
      kube-system   pod/command-executor-#####                        1/1     Running     27             195d
      kube-system   pod/coredns-#####                                 1/1     Running     27             195d
      kube-system   pod/health-reporting-app-#####                    1/1     Running     27
      kube-system   pod/kube-flannel-ds-#####                         1/1     Running     27             195d
      kube-system   pod/kube-node-monitor-#####                       1/1     Running     27             195d
      kube-system   pod/kubelet-rubber-stamp-#####                    1/1     Running     27             195d
      kube-system   pod/metrics-server-#####                          1/1     Running     27             195d
      kube-system   pod/network-health-monitor-#####                  1/1     Running     27             195d
      kube-system   pod/predictable-pod-scheduler-#####              1/1     Running     27             195d
      kube-system   pod/prelude-network-monitor-cron-########-#####   0/1     Completed   0              4m44s
      kube-system   pod/prelude-network-monitor-cron-########-#####   0/1     Completed   0              104s
      kube-system   pod/state-enforcement-cron-########-#####         0/1     Completed   0              5m44s
      kube-system   pod/state-enforcement-cron-########-#####         0/1     Completed   0              3m44s
      kube-system   pod/state-enforcement-cron-########-#####         0/1     Completed   0              104s
      kube-system   pod/update-etc-hosts-#####                        1/1     Running     27             195d
      prelude       pod/vco-app-##########-#####                      2/2     Running     1 (5d2h ago)   5d2h 
  • We also see errors related to "RestExceptions" which are linked to the "SystemDefaultDNSResolver" from the logs in "/services-logs/prelude/vco-app/file-logs/vco-server-app.log":
    • vco [host='vco-app-##########-#####' thread='WorkflowExecutorPool-Thread-#####' user='[email protected]' org='-' trace='-'] {|__SYSTEM|[email protected]:[PYTHON] # - ##### - ##### top-level:########-###-####-####-###########:token=########-####-####-####-############:anctoken=########-####-####-####-############} ch.dunes.vso.sdk.WrappedJavaMethod - Invocation exception during 'public com.vmware.o11n.plugin.rest.Response com.vmware.o11n.plugin.rest.Request.execute() throws com.vmware.o11n.plugin.rest.RESTException' call on object 'com.vmware.o11n.plugin.rest.Request@#######' java.lang.reflect.InvocationTargetException: null
      ....
       Caused by: com.vmware.o11n.plugin.rest.RESTException: Cannot execute the request: ; server.example.com at com.vmware.o11n.plugin.rest.Request.handleException(Request.java:###) ~[o11nplugin-rest-model-#.#.#.jar:?] at com.vmware.o11n.plugin.rest.Request.execute(Request.java:###) ~[o11nplugin-rest-model-#.#.#.jar:?] ... ## more Caused by: java.net.UnknownHostException: server.example.com at java.net.InetAddress$CachedAddresses.get(InetAddress.java:###) ~[?:?] at java.net.InetAddress.getAllByName#(InetAddress.java:####) ~[?:?] at java.net.InetAddress.getAllByName(InetAddress.java:####) ~[?:?] at java.net.InetAddress.getAllByName(InetAddress.java:####) ~[?:?] at org.apache.http.impl.conn.SystemDefaultDnsResolver.resolve(SystemDefaultDnsResolver.java:##) ~[httpclient-#.#.##.jar:#.#.##] at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:###) ~[httpclient-#.#.##.jar:#.#.##] at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:###) ~[httpclient-#.#.##.jar:#.#.##] at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:###) ~[httpclient-#.#.##.jar:#.#.##] at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:###) ~[httpclient-#.#.##.jar:#.#.##] at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:###) ~[httpclient-#.#.##.jar:#.#.##] at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:##) ~[httpclient-#.#.##.jar:#.#.##] at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:###) ~[httpclient-#.#.##.jar:#.#.##] at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:###) ~[httpclient-#.#.##.jar:#.#.##] at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:##) ~[httpclient-#.#.##.jar:#.#.##] at com.vmware.o11n.plugin.rest.RequestExecutor.execute(RequestExecutor.java:###) ~[o11nplugin-rest-model-#.#.#.jar:?] at com.vmware.o11n.plugin.rest.CustomContextRequestExecutor.execute(CustomContextRequestExecutor.java:##) ~[o11nplugin-rest-model-#.#.#.jar:?] at com.vmware.o11n.plugin.rest.Request.doExecute(Request.java:###) ~[o11nplugin-rest-model-#.#.#.jar:?] at com.vmware.o11n.plugin.rest.Request.execute(Request.java:###) ~[o11nplugin-rest-model-#.#.#.jar:?] ... ## more

Environment

  • Aria Automation 8.x
  • Aria Orchestrator 8.x (External/Standalone Appliance)

Cause

  • We see from the Core DNS logs that services are failing to contact the DNS Server provided in "/etc/resolv.conf" using the CoreDNS Kubernetes Service (as part of the "kube-system" namespace):
    • "/services-logs/kube-system/kube-dns/console-logs/coredns.log"
    • [ERROR] plugin/errors: # contour. AAAA: read udp ##.###.#.###:#####->##.#.#.#:##: i/o timeout
      [ERROR] plugin/errors: # contour. AAAA: read udp ##.###.#.###:#####->##.#.#.#:##: i/o timeout
      [ERROR] plugin/errors: # contour. AAAA: read udp ##.###.#.###:#####->##.#.#.#:##: i/o timeout
      [ERROR] plugin/errors: # server.example.com. A: read udp ##.###.#.###:#####->##.#.#.#:##: i/o timeout
      [ERROR] plugin/errors: # contour. AAAA: read udp ##.###.#.###:#####->##.#.#.#:##: i/o timeout
      [ERROR] plugin/errors: # server.example.com. A: read udp ##.###.#.###:#####->##.#.#.#:##: i/o timeout
      [ERROR] plugin/errors: # contour. AAAA: read udp ##.###.#.###:#####->##.#.#.#:##: i/o timeout
      [ERROR] plugin/errors: # contour. AAAA: read udp ##.###.#.###:#####->##.#.#.#:##: i/o timeout
      [ERROR] plugin/errors: # contour. AAAA: read udp ##.###.#.###:#####->##.#.#.#:##: i/o timeout
      [ERROR] plugin/errors: # contour. AAAA: read udp ##.###.#.###:#####->##.#.#.#:##: i/o timeout
      [ERROR] plugin/errors: # server.example.com. A: read udp ##.###.#.###:#####->##.#.#.#:##: i/o timeout
      [ERROR] plugin/errors: # server.example.com. A: read udp ##.###.#.###:#####->##.#.#.#:##: i/o timeout
      [ERROR] plugin/errors: # contour. AAAA: read udp ##.###.#.###:#####->##.#.#.#:##: i/o timeout
      [ERROR] plugin/errors: # contour. AAAA: read udp ##.###.#.###:#####->##.#.#.#:##: i/o timeout
      [ERROR] plugin/errors: # contour. AAAA: read udp ##.###.#.###:#####->##.#.#.#:##: i/o timeout
      [ERROR] plugin/errors: # contour. AAAA: read udp ##.###.#.###:#####->##.#.#.#:##: i/o timeout
  • With further analysis we saw issues within the CoreDNS which is responsible for communication from the Internal Kubernetes Pod Network communicating and resolving DNS for internal and external addresses.

  • We saw that on many occasions within the specified timeframes that for various pods/services we could see "i/o timeout" errors in communicating with the DNS Server (as found in "/etc/resolv.conf") for services like Contour (Kubernetes Internal Proxy) and the desired REST enabled server which the user is trying to contact.

Resolution

  • Ensure that DNS resolution is uninterrupted for your Aria Automation and/or Aria Orchestrator appliance.
  • We can ping the DNS server to confirm that it is reachable.
  • We can also validate that DNS Lookup is working by using "nslookup" against a known address within the same network..