Openshift deployment, NCP nsx-node-agent pod crashing
search cancel

Openshift deployment, NCP nsx-node-agent pod crashing

book

Article ID: 385373

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • Upgrade process hangs while upgrading an openshift cluster
  • NSX-node-agent pod keeps crashing
NAME          NETWORK UNAVAILABLE   NETWORK UNAVAILABLE REASON  NETWORK UNAVAILABLE MESSAGE

<agent_name>  True                  NSXNodeAgentNotReady   nsx-node-agent-xxxx/nsx-kube-proxy not running: back-off 5m0s restarting failed container=nsx-kube-proxy pod=nsx-node-agent-
xxxx_nsx-system. nsx-node-agent-xxxx/nsx-ovs not running: back-off 5m0s restarting failed container=nsx-ovs pod=nsx-node-agent-
xxxx_nsx-system. nsx-node-agent-xxxx/nsx-node-agent started for less than 455.390464ms.

Cause

  • Users are running the operator image for NCP 4.1.2.1, but the pods are still deployed with NCP 4.1.0
  • Therefore, when starting OVS modules, the process fails at it looks for libcrypto.so.3, but the container only has v1.1
  • The operator deployment YAML looks good, the NCP_IMAGE env variable is the to the correct value.

Resolution

Workaround: 

  1.  Configure the nsx-secret secret (https://github.com/vmware/nsx-container-plugin-operator/blob/main/deploy/kubernetes/nsx-secret.yaml) with no data
  2.  The operator should reconfigure all the NCP, nsx-ncp-bootstrap, and nsx-node-agent pods, but NCP will keep using username/password authentication because client cert data were not filled.
  3.  At this stage NCP should be running 4.1.2 and authenticating with admin users