VMware Enterprise PKS upgrade failed with error : Stopping services '[nsx-kube-proxy]' errored
search cancel

VMware Enterprise PKS upgrade failed with error : Stopping services '[nsx-kube-proxy]' errored

book

Article ID: 345542

calendar_today

Updated On:

Products

VMware

Issue/Introduction

Symptoms:
  • When you are upgrading the VMware Enterprise PKS, it failed with the error similar to:
    Stopping Monitored Services: Stopping services '[nsx-kube-proxy]' errored

  • In the bosh task for the upgrade activity, you see the entries similar to:
    Task 88758 | 14:45:49 | Updating instance master: master/859bc553-1a49-4a98-aeb1-e6d51cb6667e (2) (00:03:50)
    Task 88758 | 14:49:39 | Updating instance worker: worker/68f419aa-d3fe-4578-84ed-49510950bd3e (0) (canary) (00:00:50)
                          L Error: Action Failed get_task: Task 1c51928e-13cd-44cb-720e-842c87b3d114 result: Stopping Monitored Services: Stopping services '[nsx-kube-proxy]' errored
    Task 88758 | 14:50:29 | Error: Action Failed get_task: Task 1c51928e-13cd-44cb-720e-842c87b3d114 result: Stopping Monitored Services: Stopping services '[nsx-kube-proxy]' errored

  • In the /var/vcap/sys/log/kube-proxy/nsx-kube-proxy.stdout.log on worker node, you see the entries similar to:
    2019-02-21T12:31:40.317Z 6a2de506-a28e-4f82-81b3-ea3e1ca76458 NSX 13197 - [nsx@6876 comp="nsx-container-node" subcomp="nsx_kube_proxy" level="ERROR" errorCode="NCP02002"] nsx_ujo.k8s.proxy.proxy Failed to execute command: <nsx_ujo.k8s.proxy.proxy.OVSFlowBuildCommand object at 0x7fcbdca53d90>
    Traceback (most recent call last):
      File "/usr/local/lib/python2.7/dist-packages/nsx_ujo/k8s/proxy/proxy.py", line 342, in command_execution_loop
        cmd.execute()
      File "/usr/local/lib/python2.7/dist-packages/nsx_ujo/k8s/proxy/proxy.py", line 333, in execute
        utils.call_popen_with_privsep(shlex.split(cmd))
      File "/usr/local/lib/python2.7/dist-packages/oslo_privsep/priv_context.py", line 207, in _wrap
        return self.channel.remote_call(name, args, kwargs)
      File "/usr/local/lib/python2.7/dist-packages/oslo_privsep/daemon.py", line 202, in remote_call
        raise exc_type(*result[2])
    RuntimeError: Fatal error executing ovs-ofctl -O OpenFlow13 del-flows nsx-container cookie=7/-1
    1 2019-02-21T12:31:40.317Z 6a2de506-a28e-4f82-81b3-ea3e1ca76458 NSX 13197 - [nsx@6876 comp="nsx-container-node" subcomp="nsx_kube_proxy" level="INFO"] nsx_ujo.k8s.proxy.proxy Creating OVS flow rebuild commands
    1 2019-02-21T12:31:40.320Z 6a2de506-a28e-4f82-81b3-ea3e1ca76458 NSX 13197 - [nsx@6876 comp="nsx-container-node" subcomp="nsx_kube_proxy" level="WARNING"] oslo.privsep.daemon privsep log: ovs-ofctl: nsx-container is not a bridge or a socket
    1 2019-02-21T12:31:40.320Z 6a2de506-a28e-4f82-81b3-ea3e1ca76458 NSX 13197 - [nsx@6876 comp="nsx-container-node" subcomp="nsx_kube_proxy" level="ERROR" errorCode="NCP02003"] nsx_ujo.k8s.proxy.proxy OVS flow check failed: Fatal error executing ovs-ofctl -O OpenFlow13 dump-flows nsx-container

  • In the /var/vcap/jobs/nsx-kube-proxy logs, you see that kube-proxy exiting:
    1 2019-04-22T22:19:50.234Z d0c11156-0968-44ce-9232-ac2e02f7b3e5 NSX 27711 - [nsx@6876 comp="nsx-container-node" subcomp="nsx_kube_proxy" level="INFO"] nsx_ujo.k8s.proxy.proxy nsx_kube_proxy exiting... 



Environment

VMware PKS 1.x

Cause

The issue occurs because nsx-kube-proxy is not able to stop on the worker node.

Resolution

This is a known issue with NSX Container Plugin(NCP) 2.5.0 and earlier. it will be resolved in future release. 


Workaround:

To workaround the issue, try restarting the nsx-kube-proxy process on worker node and retry the upgrade.