When you are upgrading the VMware Enterprise PKS, it failed with the error similar to:
Stopping Monitored Services: Stopping services '[nsx-kube-proxy]' errored
In the bosh task for the upgrade activity, you see the entries similar to:
Task 88758 | 14:45:49 | Updating instance master: master/859bc553-1a49-4a98-aeb1-e6d51cb6667e (2) (00:03:50)
Task 88758 | 14:49:39 | Updating instance worker: worker/68f419aa-d3fe-4578-84ed-49510950bd3e (0) (canary) (00:00:50)
L Error: Action Failed get_task: Task 1c51928e-13cd-44cb-720e-842c87b3d114 result: Stopping Monitored Services: Stopping services '[nsx-kube-proxy]' errored
Task 88758 | 14:50:29 | Error: Action Failed get_task: Task 1c51928e-13cd-44cb-720e-842c87b3d114 result: Stopping Monitored Services: Stopping services '[nsx-kube-proxy]' errored
In the /var/vcap/sys/log/kube-proxy/nsx-kube-proxy.stdout.log on worker node, you see the entries similar to:
2019-02-21T12:31:40.317Z 6a2de506-a28e-4f82-81b3-ea3e1ca76458 NSX 13197 - [nsx@6876 comp="nsx-container-node" subcomp="nsx_kube_proxy" level="ERROR" errorCode="NCP02002"] nsx_ujo.k8s.proxy.proxy Failed to execute command: <nsx_ujo.k8s.proxy.proxy.OVSFlowBuildCommand object at 0x7fcbdca53d90>
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/nsx_ujo/k8s/proxy/proxy.py", line 342, in command_execution_loop
cmd.execute()
File "/usr/local/lib/python2.7/dist-packages/nsx_ujo/k8s/proxy/proxy.py", line 333, in execute
utils.call_popen_with_privsep(shlex.split(cmd))
File "/usr/local/lib/python2.7/dist-packages/oslo_privsep/priv_context.py", line 207, in _wrap
return self.channel.remote_call(name, args, kwargs)
File "/usr/local/lib/python2.7/dist-packages/oslo_privsep/daemon.py", line 202, in remote_call
raise exc_type(*result[2])
RuntimeError: Fatal error executing ovs-ofctl -O OpenFlow13 del-flows nsx-container cookie=7/-1
1 2019-02-21T12:31:40.317Z 6a2de506-a28e-4f82-81b3-ea3e1ca76458 NSX 13197 - [nsx@6876 comp="nsx-container-node" subcomp="nsx_kube_proxy" level="INFO"] nsx_ujo.k8s.proxy.proxy Creating OVS flow rebuild commands
1 2019-02-21T12:31:40.320Z 6a2de506-a28e-4f82-81b3-ea3e1ca76458 NSX 13197 - [nsx@6876 comp="nsx-container-node" subcomp="nsx_kube_proxy" level="WARNING"] oslo.privsep.daemon privsep log: ovs-ofctl: nsx-container is not a bridge or a socket
1 2019-02-21T12:31:40.320Z 6a2de506-a28e-4f82-81b3-ea3e1ca76458 NSX 13197 - [nsx@6876 comp="nsx-container-node" subcomp="nsx_kube_proxy" level="ERROR" errorCode="NCP02003"] nsx_ujo.k8s.proxy.proxy OVS flow check failed: Fatal error executing ovs-ofctl -O OpenFlow13 dump-flows nsx-container
In the /var/vcap/jobs/nsx-kube-proxy logs, you see that kube-proxy exiting:
1 2019-04-22T22:19:50.234Z d0c11156-0968-44ce-9232-ac2e02f7b3e5 NSX 27711 - [nsx@6876 comp="nsx-container-node" subcomp="nsx_kube_proxy" level="INFO"] nsx_ujo.k8s.proxy.proxy nsx_kube_proxy exiting...
The issue occurs because nsx-kube-proxy is not able to stop on the worker node.
This is a known issue with NSX Container Plugin(NCP) 2.5.0 and earlier. it will be resolved in future release.
To workaround the issue, try restarting the nsx-kube-proxy process on worker node and retry the upgrade.