Troubleshooting bbsLRPsExtra metric increase
search cancel

Troubleshooting bbsLRPsExtra metric increase

book

Article ID: 376076

calendar_today

Updated On:

Products

VMware Tanzu Application Service

Issue/Introduction

This article will detail some methods for troubleshooting an unexpected increase in the TAS metric bbs.LRPsExtra.

A brief summary of this metric is:

Total number of LRP instances that are no longer desired but still have a BBS record. When Diego wants to add more apps, the BBS sends a request to the Auctioneer to spin up additional LRPs. LRPsExtra is the total number of LRP instances that are no longer desired but still have a BBS record.

This means that there are potentially records of LRPs remaining in the database that are no longer desired and should be deleted. More info on this metric can be found here, under "BBS time to handle requests":

https://techdocs.broadcom.com/us/en/vmware-tanzu/platform/tanzu-platform-for-cloud-foundry/6-0/tpcf/monitoring-kpi.html

To confirm what your current value for this metric is, you can use the below cf nozzle command. 

cf nozzle -f ValueMetric | grep -e 'origin:"bbs"' | grep -i extra

The cf nozzle plugin needs to be installed from here:

https://github.com/cloudfoundry-community/firehose-plugin

Environment

All TAS envs.

Cause

In very rare cases, some app containers fail to exit successfully, though the main process exit, the envoy process keeps running, which make the app instance as CLAIMED state. Even BBS and rep keeps sending termination command to garden, in this special situation, the orphaned container could not be cleaned up properly.  

Resolution

The main identifier of extra LRPs are entries in the diego database that are in a CLAIMED state, but have no actual processes behind them.

To identify these, first please check the TAS Diego DB:

$ bosh -d <cf deployment> ssh mysql/0
$ sudo mysql --defaults-file=/var/vcap/jobs/pxc-mysql/config/mylogin.cnf -D diego
mysql> SELECT *
FROM actual_lrps
JOIN domains ON actual_lrps.domain = domains.domain
LEFT JOIN desired_lrps ON actual_lrps.process_guid = desired_lrps.process_guid
WHERE actual_lrps.presence = 0 AND desired_lrps.process_guid IS NULL;

This should provide a list of LRPs in claimed state, the output should match the current value of bbsLRPsExtra. 

In order to resolve bbsLRPsExtra, you can restart `garden` job on the diego_cell that is supposed to be hosting the app instance. At first, take the VM instance ID from the 'cell_id' column in the output of the previous SQL query, then run `ps` command to gather process list for investigation use. 

$ bosh -d <cf deployment> ssh diego_cell/<cell_id> -c "sudo ps axjfww"

Finally restart `garden` job on the cell. 

$ bosh -d <cf deployment> ssh diego_cell/<cell_id> -c "sudo /var/vcap/bosh/bin/monit restart garden"

After restarting garden job on all listed diego_cells, metric bbs.LRPsExtra should be reset to 0.