DX APM - ACC not working, ng-acc-configserver pods in CrashLoopBackOff

book

Article ID: 144154

calendar_today

Updated On:

Products

CA Application Performance Management (APM / Wily / Introscope)

Issue/Introduction

Symptoms:

a) ACC pods are in CrashLoopBackOff or init status:

kubectl get pods -ndxi | grep acc

ng-acc-configserver-db-deploymnet - CrashLoopBackOff
ng-acc-configserver-deployment - CrashLoopBackOff
ng-acc-repository-deploymnet  (init:01)

scaling them down and up doesn't help

b) When checking the ng-acc-configserver pod messages indicating that posgres database cannot be started.

kubectl logs <ng-acc-configserver-db-deployment-pod> -ndxi

Cause

ACC pods are unable to initialize and start because ACC postgres database is corrupted

In this case, the file-system of the NFS server ran out of disk space damaging some data files

It is not possible to recover the data files. 

Environment

APM 11.x, 19.x

Resolution

In this example, NFS-DXI-Folder is located in /data/nfs/ca/dxi

ng-acc-configserver-db-deployment

1. Login to Kubernetes console, select DXI namespace, click on Workloads / Deployments

Scale down ng-acc-configserver-db-deployment

2. Backup and move postgres ACC corrupted database: 

mv /data/nfs/ca/dxi/acc/cs/db to db-bkp

3. Click on Workloads / Deployments

Scale up ng-acc-configserver-db-deployment

4. Verify that pod is up

5. Verify that database startup up successfully:

Obtain the pod for ng-acc-configserver-db-deployment

kubectl get pods -ndxi | grep acc

Check the log

kubectl logs <ng-acc-configserver-db-deployment-pod> -ndxi

 

 

ng-acc-configserver-deployment

1. In Kubernetes, DXI namespace, click on Workloads / Deployments

Scale down and up ng-acc-configserver-deployment

2. Verify that pod is up

 

ng-acc-repository-deployment

1. In Kubernetes, DXI namespace, click on Workloads / Deployments

Scale down ng-acc-repository-deployment

2. backup corrupted acc repository: 

mv /data/nfs/ca/dxi/acc/cs/repository repository.bck 

mkdir /data/nfs/ca/dxi/acc/cs/repository

chmod -R 1010:1010 repository

3. click on Workloads / Deployments

Scale up ng-acc-repository-deployment

4. Verify that pod is up

 

IMPORTANT: If you have already created some Tenants apply below steps otherwise you can proceed to create a new tenant, you should be able to access ACC UI


Recover a specific tenant after ACC DB deletion

First 4 steps describe how to obtain values necessary for a REST API call in step 5.

1. Find value of ACC management token

- In Kubernetes, DXI namespace, click on Config & Storage / Secrets menu item.
-Click on item ng-acc-configserver-secret
-Click on an eye icon next to "token" secret.
-An ACC management token is displayed like this:
    token: 81bacd65-9874-49ee-98b3-7a312ebfd792

Remember this value as ACC_MANAGEMENT_TOKEN for use in step 5.


2. Find hostname of EM container of the tenant

- In Kubernetes, DXI namespace, click on Discovery and Load Balancing / Services menu item.
- Use filter icon to search for "apm-em-10" where the 10 is tenant service id of the tenant.
- The item to look for looks like "apm-em-10-958963" where 10 is tenant service id of the tenant and other 6 digit number is random assigned during tenant creation. Click on it.
- Copy value in name field. 
This is the hostname of the EM container of the tenant. 

Remember this value as EM_HOSTNAME for use in step 5.


3. Find EM-ACC integration token

- Continue from step 2, click on the Pod that is in the Pods section of the service.
- Detail view of the Pod shows in section Containers / Environment variables an environment variable ACC_TOKEN with a value like this:
    ACC_TOKEN: 9403e0e2-8ea7-40b9-8f22-dd230706bbc8
This is an EM-ACC integration token. 

Remember this value as EM_ACC_INTEGRATION_TOKEN for use in step 5


4. Find TENANT_ID and TENANT_NAME for use in step 5

Easiest way to get tenant id and tenant name is by using dximanager UI which shows up when logging in as "masteradmin" tenant and "masteradmin" account.
There is a Tenant icon on the left side. Click on it, it shows entries for all tenants. The text in "Tenant ID" column, here "test001", is TENANT_NAME. A tooltip shows with mouse-over the element with TENANT_ID, here 10.

5. Register a tenant in ACC

- In Kubernetes, DXI namespace, click on Workloads / Deployments
- Use filter icon and search for "ng-acc-configserver-deployment", click on the displayed item.
- Click on item in a New Replica Set, then click on item in Pods section.
- Click on Exec text/icon in the header of displayed Pod. Prompt should now show the shell is in “APMCommandCenterServer” directory.
- Prepare a command for execution in a plain text editor:
   
curl -v -X POST  -H 'Authorization: Bearer ACC_MANAGEMENT_TOKEN' -H 'Content-Type: application/json' -d '
{
  "internalId" : TENANT_ID,
  "externalId" : "TENANT_NAME",
  "emUrl" : "http://EM_HOSTNAME:8081/",
  "integrationUserToken" : "EM_ACC_INTEGRATION_TOKEN"
}
' http://localhost:8088/apm/appmap/acc/apm/acc/tenant 

Fill in actual remembered values for colored placeholders.

- Copy the command from the editor. Paste the command into Shell using Shift-Insert key.
- Verify expected status code is HTTP/1.1 201 Created
.

6. Validate

- Login to the tenant. ATC UI should appear.
- Click on APM Command Center link in dropdown menu next “ALL MY UNIVERSES” at the top right.
- ACC UI should be displayed without red ribbon that contains error message at the top.
- If ACC bundles have not been re-imported, the Bundles menu will show 0 bundles. If you want to re-import bundles, perform steps like in Step#5, prepare and execute following commands:

rm -rf repository/com temp/*
curl -v -X POST  -H 'Authorization: Bearer ACC_MANAGEMENT_TOKEN' -H 'Content-Type: application/json' -d '{}' http://localhost:8088/apm/appmap/acc/apm/acc/bundle/refresh

Second curl command may take a few minutes to complete and returns status code HTTP/1.1 204 No Content

Attachments