Symptoms:
a) Agent Download Dialog not listing all the agent packages
Example 1: not all agent packages available
Example 2: clicking on Download Agent gives "Server error. Please contact the Customer Support"
b) ng-acc-configserver pods in CrashLoopBackOff or init status:
kubectl get pods -n<namespace> | grep acc
ng-acc-configserver-db-deploymnet - CrashLoopBackOff
ng-acc-configserver-deployment - CrashLoopBackOff
ng-acc-repository-deployment (init:01)
scaling them down and up doesn't help
c) When checking the ng-acc-configserver pod messages indicating that postgres database cannot be started.
kubectl logs <ng-acc-configserver-db-deployment-pod> -n<namespace>
DX Application Performance Management 2x
ACC postgres db is sometimes partially initialized, ACC is not responding and working as expected.
Possible root causes:
- NFS service stopped
- ACC postgres database corruption
- NFS ran out of disk space causing data corruption, it is not possible to recover the data files.
ng-acc-configserver-db-deployment
1. Scale down ng-acc-configserver-db-deployment
kubectl scale --replicas=0 deployment ng-acc-configserver-db-deployment -n<namespace>
2. verify ng-acc-configserver-db-deployment is not running:
kubectl get pods -n<namespace> | grep ng-acc-configserver-db-deployment
3. Scale up ng-acc-configserver-db-deployment
kubectl scale --replicas=1 deployment ng-acc-configserver-db-deployment -n<namespace>
4. Verify that pod is up
kubectl get pods -n<namespace> | grep ng-acc-configserver-db-deployment
5. Verify that database started successfully:
kubectl logs <ng-acc-configserver-db-deployment-pod> -n<namespace>
ng-acc-configserver-deployment
1. Scale down ng-acc-configserver-deployment
kubectl scale --replicas=0 deployment ng-acc-configserver-deployment -n<namespace>
2. Verify that pod is down
kubectl get pods -n<namespace> | grep ng-acc-configserver-deployment
3. Scale up ng-acc-configserver-deployment
kubectl scale --replicas=1 deployment ng-acc-configserver-deployment -n<namespace>
4. Verify that pod is up
kubectl get pods -n<namespace> | grep ng-acc-configserver-deployment
ng-acc-repository-deployment
1. Scale down ng-acc-repository-deployment
kubectl scale --replicas=0 deployment ng-acc-repository-deployment -n<namespace>
2. backup corrupted acc repository:
cd <nfs-dir>/acc/cs/repository
mv <nfs-dir>/acc/cs/repository repository.bck
mkdir <nfs-dir>/acc/cs/repository
chmod -R 1010:1010 repository
3. Scale up ng-acc-repository-deployment
kubectl scale --replicas=1 deployment ng-acc-repository-deployment -n<namespace>
4. Verify that pod is up
kubectl get pods -n<namespace> | grep ng-acc-repository-deployment
Check if can see the Agents from Agent Download Dialog
- Go to Cluster Management as MASTERADMIN
- Deactivate the existing tenant
- Create a fresh new tenant
- Check if can see the Agents from Agent Download Dialog
NOTE: in this example, <nfs-folder> is /nfs/ca/dxi
ng-acc-configserver-db-deployment
1. Scale down ng-acc-configserver-db-deployment
kubectl scale --replicas=0 deployment ng-acc-configserver-db-deployment -n<namespace>
2. Backup existing ACC corrupted database (<nfs-dir>/acc/cs/db)
Examaple:
mkdir -p /backups/db-bkp
cp -rpf /nfs/ca/dxi/acc/cs/db to /backups/db-bkp
3. Scale up ng-acc-configserver-db-deployment
kubectl scale --replicas=1 deployment ng-acc-configserver-db-deployment -n<namespace>
4. Verify that pod is up
kubectl get pods -n<namespace> | grep ng-acc-configserver-db-deployment
5. Verify that database started successfully:
kubectl logs <ng-acc-configserver-db-deployment-pod> -n<namespace>
ng-acc-configserver-deployment
1. Scale down ng-acc-configserver-deployment
kubectl scale --replicas=0 deployment ng-acc-configserver-deployment -n<namespace>
2. Verify that pod is down
kubectl get pods -n<namespace> | grep ng-acc-configserver-deployment
3. Scale up ng-acc-configserver-deployment
kubectl scale --replicas=1 deployment ng-acc-configserver-deployment -n<namespace>
4. Verify that pod is up
kubectl get pods -n<namespace> | grep ng-acc-configserver-deployment
ng-acc-repository-deployment
1. Scale down ng-acc-repository-deployment
kubectl scale --replicas=0 deployment ng-acc-repository-deployment -n<namespace>
2. backup corrupted acc repository:
cd <nfs-dir>/acc/cs/repository
mv <nfs-dir>/acc/cs/repository repository.bck
mkdir <nfs-dir>/acc/cs/repository
chmod -R 1010:1010 repository
3. Scale up ng-acc-repository-deployment
kubectl scale --replicas=1 deployment ng-acc-repository-deployment -n<namespace>
4. Verify that pod is up
kubectl get pods -n<namespace> | grep ng-acc-repository-deployment
Check if can see the Agents from Agent Download Dialog
Optional: If you have already created Tenants apply below steps in addition to recover specific tenant information after ACC database deletion
First 4 steps describe how to obtain values necessary for a REST API call in step 5.
1. Find value of ACC management token
- In Kubernetes, DXI namespace, click on Config & Storage / Secrets menu item.
-Click on item ng-acc-configserver-secret
-Click on an eye icon next to "token" secret.
-An ACC management token is displayed like this:
token: <token>
Remember this value as ACC_MANAGEMENT_TOKEN for use in step 5.
2. Find hostname of EM container of the tenant
- In Kubernetes, DXI namespace, click on Discovery and Load Balancing / Services menu item.
- Use filter icon to search for "apm-em-10" where the 10 is tenant service id of the tenant.
- The item to look for looks like "<host>-10-958963" where 10 is tenant service id of the tenant and other 6 digit number is random assigned during tenant creation. Click on it.
- Copy value in name field.
This is the hostname of the EM container of the tenant.
Remember this value as EM_HOSTNAME for use in step 5.
3. Find EM-ACC integration token
- Continue from step 2, click on the Pod that is in the Pods section of the service.
- Detail view of the Pod shows in section Containers / Environment variables an environment variable ACC_TOKEN with a value like this:
ACC_TOKEN: <token>
This is an EM-ACC integration token.
Remember this value as EM_ACC_INTEGRATION_TOKEN for use in step 5
4. Find TENANT_ID and TENANT_NAME for use in step 5
Easiest way to get tenant id and tenant name is by using dximanager UI which shows up when logging in as "masteradmin" tenant and "masteradmin" account.
There is a Tenant icon on the left side. Click on it, it shows entries for all tenants. The text in "Tenant ID" column, here "test001", is TENANT_NAME. A tooltip shows with mouse-over the element with TENANT_ID, here 10.
5. Register a tenant in ACC
- In Kubernetes, DXI namespace, click on Workloads / Deployments
- Use filter icon and search for "ng-acc-configserver-deployment", click on the displayed item.
- Click on item in a New Replica Set, then click on item in Pods section.
- Click on Exec text/icon in the header of displayed Pod. Prompt should now show the shell is in “APMCommandCenterServer” directory.
- Prepare a command for execution in a plain text editor:
curl -v -X POST -H 'Authorization: Bearer ACC_MANAGEMENT_TOKEN' -H 'Content-Type: application/json' -d '
{
"internalId" : TENANT_ID,
"externalId" : "TENANT_NAME",
"emUrl" : "http://EM_HOSTNAME:8081/",
"integrationUserToken" : "EM_ACC_INTEGRATION_TOKEN"
}
' http://localhost:8088/apm/appmap/acc/apm/acc/tenant
Fill in actual remembered values for colored placeholders.
- Copy the command from the editor. Paste the command into Shell using Shift-Insert key.
- Verify expected status code is HTTP/1.1 201 Created
.
6. Validate
- Login to the tenant. ATC UI should appear.
- Click on APM Command Center link in dropdown menu next “ALL MY UNIVERSES” at the top right.
- ACC UI should be displayed without red ribbon that contains error message at the top.
- If ACC bundles have not been re-imported, the Bundles menu will show 0 bundles. If you want to re-import bundles, perform steps like in Step#5, prepare and execute following commands:
rm -rf repository/com temp/*
curl -v -X POST -H 'Authorization: Bearer ACC_MANAGEMENT_TOKEN' -H 'Content-Type: application/json' -d '{}' http://localhost:8088/apm/appmap/acc/apm/acc/bundle/refresh
Second curl command may take a few minutes to complete and returns status code HTTP/1.1 204 No Content