VMware Tanzu Application Service (TAS) for VMs 2.6 installation fails when Prometheus Node Exporter BOSH addon is deployed "Error: 'nats/a29abeca-a60d-49bc-a18d-7707eb6079c0 (0)' is not running after update. Review logs for failed jobs: node_exporter"
search cancel

VMware Tanzu Application Service (TAS) for VMs 2.6 installation fails when Prometheus Node Exporter BOSH addon is deployed "Error: 'nats/a29abeca-a60d-49bc-a18d-7707eb6079c0 (0)' is not running after update. Review logs for failed jobs: node_exporter"

book

Article ID: 297975

calendar_today

Updated On:

Products

VMware Tanzu Application Service for VMs

Issue/Introduction

VMware Tanzu Application Service (TAS) for VMs 2.6 installation fails because the Prometheus Node Exporter, which is deployed as a BOSH Addon (https://github.com/prometheus/prometheus/wiki/Default-port-allocations) and the loggr-system-metrics-agent jobs are both trying to run on port 9100.

The loggr-system-metrics-agent process is part of the system-metrics agent which was added in TAS for VMs 2.6.
 
You will see similar error message during the deployment or when selecting Apply Changes.
Task 1118 | 05:45:21 | Error: 'nats/a29abeca-a60d-49bc-a18d-7707eb6079c0 (0)' is not running after update. Review logs for failed jobs: node_exporter

Either the node_exporter job or the loggr-system-metrics-agent job fails with messages similar to the one above.
 
When you ssh into the failing VM, which is nats in the example above, and look at the failing job's logs - /var/vcap/sys/log/node_exporter/node_exporter.stderr.log will contain the following:
level=fatal msg="listen tcp :9100: bind: address already in use" source="node_exporter.go:172"


Environment

Product Version: 2.6

Resolution

You may use one of the three options listed below to fix the issue. 

1. You can remove node_exporter from the BOSH releases as follows:
 bosh delete-config --type runtime --name node_exporter

2. You can deselect "Enable system metrics" on the System Logging form in PAS > System Logging. Please note that this will disable system-level metrics to be emitted on all VMs.

3. Inside the runtime.yml file in Node Exporter release, add the following in the properties section:
properties:
      node_exporter:
        web:
          port: "<PORT_NUM>"

There also is an operators file depending on how you have installed Prometheus. The file operators/monitor-bosh.yml will have 9100 as the defined port for node_exporter.

4. This scrape config needs to be changed and applied using: "bosh -d prometheus deploy"
regex: node_exporter
        action: keep
      - source_labels:
        - __address__
        regex: "(.*)"
        target_label: __address__
        replacement: "${1}:9100"

5. The section above needs to be changed to the section below:
 regex: node_exporter
        action: keep
      - source_labels:
        - __address__
        regex: "(.*)"
        target_label: __address__
        replacement: "${1}:<PORT_NUM>"

Here PORT_NUM is the new port number that doesn't conflict with any other ports being used.

After making the two changes above, update the BOSH Runtime Config to the following:
bosh update-runtime-config <your runtime-config.yaml file location>

Once the runtime config is updated, an Apply Changes will apply the updated runtime config to all Deployments.
 
Note: This issue has been fixed in TAS for VMs 2.8. You would need to upgrade both Ops Manager and TAS for VMs to version 2.8.