Gorouter process becomes unresponsive when Dynatrace Golang injection is enabled
search cancel

Gorouter process becomes unresponsive when Dynatrace Golang injection is enabled

book

Article ID: 298124

calendar_today

Updated On:

Products

VMware Tanzu Application Service for VMs

Issue/Introduction

Observed behavior:
  • One or more Gorouters will stop taking requests. You can confirm this by checking /var/vcap/sys/log/gorouter/access.log - there will be no new calls recorded.
  • The Gorouter health endpoint will show it's down. For example, curl http://GOROUTER_IP:8080/health.
  • BOSH shows all processes are running.
The following conditions need to be met for this issue to occur:
  • Dynatrace OneAgent is running on the platform.
  • Dynatrace deep monitoring (golang injection) is enabled and compatible with the version of Tanzu Application Service (TAS). If TAS is running a Go process that uses a version of Go that is not compatible with the Dynatrace agent version, deep monitoring will NOT work and you should see a message in your Dynatrace dashboard. 
Note: While the issue was primarily seen on Gorouters, it could technically affect any component running Go processes.

Environment

Product Version: 2.9

Resolution

The issue is caused by interference of a new Golang feature introduced in Go 1.14 and a bug in a Linux kernel system call, which is used by Dynatrace OneAgent.

For reference, Go 1.14 was not introduced in the routing release until version 0.200.0: https://github.com/cloudfoundry/routing-release/blob/0.200.0/docs/go.version

The following TAS releases include routing 0.200.0 or greater:

  • TAS 2.7.17
  • TAS 2.8.11
  • TAS 2.9.5
  • TAS 2.10

To get the Gorouter working again, simply restart the Gorouter process:

ssh into Gorouter
sudo -i
monit restart gorouter

or 

bosh -d cf-DEP-ID ssh router/ID 'sudo /var/vcap/bosh/bin/monit restart gorouter'


Workarounds

  • Disable Go process deep monitoring in Dynatrace.
  • Revert to a version of the Dynatrace OneAgent where Golang injection is not compatible.
  • bosh -d cf-DEP-ID ssh router 'sudo /var/vcap/bosh/bin/monit stop dynatrace-oneagent'

Permanent Resolution

A minimal risk resolution, which inhibits the error condition scenario, has been implemented by Dynatrace and backported to all released OneAgent versions until 1.203. The fix is currently under verification and expected to be available December 23, 2020.

Note: A general fix which makes OneAgent code invoking the Linux system call resilient against the kernel bug will be released within the normal OneAgent release cycle.