NAPP: "Failed to obtain the features and access control information from the server."

Products

VMware vDefend Firewall with Advanced Threat Prevention

Issue/Introduction

To return functionality of the NAPP UI.

Symptoms:
This error will be displayed on the NSX Application Platform (NAPP) page under the "System" tab in NSX-T

The Tanzu Kubernetes Cluster, and Supervisor Cluster both show "Running" status.

To determine the health of the Supervisor cluster please perform the following checks:

Log into the vSphere client where the Supervisor Cluster is deployed as an SSO administrator
Click on the "menu" button then "Workload Management"
Locate the "Supervisor Clusters" tab
"Running" status is indicated by a green check mark with no errors

To determine the health of the Tanzu Kubernetes Cluster hosting Application Platform please perform the following checks:

Use the steps provided in this VMware Docs page to log into the Tanzu Kubernetes Cluster
Once logged into the Supervisor context, run the command "kubectl get tkc -A"
- The NAPP cluster should show "True" in the Ready column

NOTE: This document does not yet contain an exhaustive list of reasons why this error can occur. If the steps in this article do not resolve your issue please log a ticket with VMware Tanzu Support.

Environment

VMware vSphere 7.0 with Tanzu
VMware vSphere 8.0 with Tanzu

NAPP

Cause

The cluster-api pod within the NAPP cluster is stuck pending because it cannot schedule onto one of the worker nodes.

Describing the pod shows messaging similar to the following:

The cluster-api pod depends on another pod via podAffinity. This can be seen in the pod or deployment YAML file:


    spec:
      affinity:
        podAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: logger
                operator: In
                values:
                - aggregator
            topologyKey: kubernetes.io/hostname

The cluster-api pod will not run unless it is on the same node as the pod that contains the logger=aggregator label.

Resolution

To resolve this issue, the cluster-api pod must be scheduled, and in "Running" status. The steps below should be used as a guide to troubleshoot why the pod is stuck.

To get started, refer to the steps in this document to log directly into the Tanzu Kubernetes Cluster on which the NAPP workload runs:

Display all pods, their labels, and their scheduled nodes within the nsxi-platform namespace using the following command: kubectl get pods -n nsxi-platform -o wide --show-labels
Using the output from the above command, search for the "logger=aggregator" label. In this case it belongs to the fluentd-0 pod running on node nappcluster-workers-abcde-1234567890-vwxyz:

fluentd-0 1/1 Running 0 42h 192.168.0.5 nappcluster-workers-abcde-1234567890-vwxyz <none> <none> allow-traffic-to-dns=true,app.kubernetes.io/component=aggregator,app.kubernetes.io/instance=nsxi-platform,app.kubernetes.io/managed-by=Helm,app.kubernetes.io/name=fluentd,app=aggregator,controller-revision-hash=fluentd-cd5c878c5,helm.sh/chart=fluentd-3.4.0,logger=aggregator,statefulset.kubernetes.io/pod-name=fluentd-0,vmware/version=4.1.1-0.0-22213788

Using the kubectl top nodes command shows that this node is not accepting pods due to high memory usage:

napp@jumpbox:~$ kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
nappcluster-control-plane-abcde 371m 18% 4389Mi 55%
nappcluster-workers-abcde-1234567890-bbbbb 3077m 19% 47538Mi 73%
nappcluster-workers-abcde-1234567890-ccccc 2945m 18% 4979Mi 7%
nappcluster-workers-abcde-1234567890-vwxyz 3057m 19% 53929Mi 88%
nappcluster-workers-abcde-1234567890-ddddd 4962m 31% 35496Mi 55%
nappcluster-workers-abcde-1234567890-fffff 340m 2% 6977Mi 10%

Follow the "cause" and "resolution" steps in this KB article to delete pods that consume high amounts of memory. They will recreate automatically on new nodes.
The cluster-api pod should schedule immediately once sufficient memory is available on the same node that the fluentd-0 pod is running on.

napp@jumpbox:~$ kubectl get pods -n nsxi-platform -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
cluster-api-7c7d7747b5-jtr6d 2/2 Running 0 2m 192.168.0.4 nappcluster-workers-abcde-1234567890-vwxyz <none> <none>
fluentd-0 1/1 Running 0 43h 192.168.0.5 nappcluster-workers-abcde-1234567890-vwxyz <none> <none>