Rep job keeps producing "Post \"http://127.0.0.1:7589/Plugin.Activate\": dial tcp 127.0.0.1:7589: connect: connection refused" error even after "NFSv3 volume services" is disabled
search cancel

Rep job keeps producing "Post \"http://127.0.0.1:7589/Plugin.Activate\": dial tcp 127.0.0.1:7589: connect: connection refused" error even after "NFSv3 volume services" is disabled

book

Article ID: 298390

calendar_today

Updated On:

Products

VMware Tanzu Application Service for VMs

Issue/Introduction

"NFSv3 volume services" is disabled on TAS tile followed by successful “Apply Changes”. As a result all diego_cell instances are updated with no nfsv3driver job running on them. However rep job still keeps producing “connect: connection refused” errors for connecting to nfsv3driver endpoint "http://127.0.0.1:7589/Plugin.Activate\\" as shown in example below.
{"timestamp":"1681161565.474198818","source":"rep","message":"rep.sync-plugin.discover.activate.request-failed","log_level":2,"data":{"error":"Post \"http://127.0.0.1:7589/Plugin.Activate\": dial tcp 127.0.0.1:7589: connect: connection refused","session":"6.19873.2"}}
{"timestamp":"1681161565.474227905","source":"rep","message":"rep.sync-plugin.discover.activate.failed-activate","log_level":2,"data":{"error":"Post \"http://127.0.0.1:7589/Plugin.Activate\": dial tcp 127.0.0.1:7589: connect: connection refused","session":"6.19873.2"}}
{"timestamp":"1681161565.474236965","source":"rep","message":"rep.sync-plugin.discover.activate.end","log_level":1,"data":{"session":"6.19873.2"}}
{"timestamp":"1681161565.474245071","source":"rep","message":"rep.sync-plugin.discover.existing-driver-unreachable","log_level":2,"data":{"address":"http://127.0.0.1:7589","error":"Post \"http://127.0.0.1:7589/Plugin.Activate\": dial tcp 127.0.0.1:7589: connect: connection refused","session":"6.19873","spec-name":"nfsv3driver","tls":null}}
{"timestamp":"1681161565.474299908","source":"rep","message":"rep.sync-plugin.discover.updating-driver","log_level":1,"data":{"driver-path":"/var/vcap/data/voldrivers","session":"6.19873","spec-name":"nfsv3driver"}}
{"timestamp":"1681161565.474314690","source":"rep","message":"rep.sync-plugin.discover.driver.start","log_level":1,"data":{"driverFileName":"nfsv3driver.json","driverId":"nfsv3driver","session":"6.19873.3"}}
{"timestamp":"1681161565.474363327","source":"rep","message":"rep.sync-plugin.discover.driver.getting-driver","log_level":1,"data":{"address":"http://127.0.0.1:7589","driverFileName":"nfsv3driver.json","driverId":"nfsv3driver","session":"6.19873.3"}}
{"timestamp":"1681161565.474380255","source":"rep","message":"rep.sync-plugin.discover.driver.end","log_level":1,"data":{"driverFileName":"nfsv3driver.json","driverId":"nfsv3driver","session":"6.19873.3"}}
{"timestamp":"1681161565.474389553","source":"rep","message":"rep.sync-plugin.discover.activate.start","log_level":1,"data":{"session":"6.19873.4"}}
{"timestamp":"1681161565.474685669","source":"rep","message":"rep.sync-plugin.discover.activate.request-failed","log_level":2,"data":{"error":"Post \"http://127.0.0.1:7589/Plugin.Activate\": dial tcp 127.0.0.1:7589: connect: connection refused","session":"6.19873.4"}}
{"timestamp":"1681161565.474720001","source":"rep","message":"rep.sync-plugin.discover.activate.failed-activate","log_level":2,"data":{"error":"Post \"http://127.0.0.1:7589/Plugin.Activate\": dial tcp 127.0.0.1:7589: connect: connection refused","session":"6.19873.4"}}

There is no reason for rep job to continue attempting to establish connection to nfsv3driver endpoint since it's already been disabled and the nfsv3driver job is not running to listen on  127.0.0.1:7589. 

Though such error has no harm to service, it may add unnecessary logs to external system logging systems. If there are alarms based on keywords then this may trigger false alarms based on keywords such as "error" or "connect: connection refused".

Environment

Product Version: 2.13

Resolution

The product team has acknowledged this bug and created a story to prevent rep from constantly producing these unneeded error logs.

Before a final fix is ready, the issue could be temporarily mitigated by renaming file /var/vcap/data/voldrivers/nfsv3driver.json on all diego_cell instances with command like
$ bosh -d cf-xxxx ssh diego_cell -c 'sudo mv /var/vcap/data/voldrivers/nfsv3driver.json /var/vcap/data/voldrivers/nfsv3driver.json.orig'

Note:

  • Replace cf-xxxx with real cf or isolation segment deployment name on the TAS foundation
  • The temporary solution should be applied again if diego_cell instance is updated or recreated