Unable to upgrade VMware PKS from 1.0.2 to 1.1
search cancel

Unable to upgrade VMware PKS from 1.0.2 to 1.1


Article ID: 345615


Updated On:


VMware Cloud PKS


  • You are unable to upgrade VMware PKS from 1.0.2 to 1.1

  • When you apply changes from Ops manager, worker node going into unresponsive state and upgrade clusters errand task is failing.

  • From the logs, the agents bootstrapped they could not find the eth0 interface and this broke the agent director communication. In pks-nsx-t release 0.5 (PKS 1.0.2), the jobs remove IPs from eth0 and adds them to the bridge. Bosh agent expects eth0 to be configured with an IP. Due to a race condition when bosh agent start and not able to find this IP the communication fails with the error below.

2018-06-28_17:19:28.24511 [interfaceConfigurationCreator] 2018/06/28 17:19:28 DEBUG - Using static networking 2018-06-28_17:19:28.24533 [File System] 2018/06/28 17:19:28 DEBUG - Stat '/etc/network/interfaces' 2018-06-28_17:19:28.24590 [File System] 2018/06/28 17:19:28 DEBUG - Skipping writing /etc/network/interfaces because contents are identical 2018-06-28_17:19:28.24667 [main] 2018/06/28 17:19:28 ERROR - App setup Running bootstrap: Setting up networking: Validating static network configuration: Validating network interface 'eth0' IP addresses, no interface configured with that name 2018-06-28_17:19:28.24668 [main] 2018/06/28 17:19:28 ERROR - Agent exited with error: Running bootstrap: Setting up networking: Validating static network configuration: Validating network interface 'eth0' IP addresses, no interface configured with that name 2018-06-28_17:19:28.26523 [main] 2018/06/28 17:19:28 DEBUG - Starting agent


VMware Pivotal Container Service 1.x


This issue occurs due to a bug present in pks-nsx-t release. This bug was fixed in pks-nsx-t version 0.8 but PKS 1.0.2 tile uses pks-nsx-t release 0.5. 


Starting pks-nsx-t release 0.8, the scripts present in pks-nsx-t release assigns IPs to both eth0 and the bridge and configures the routing to use the bridge.


To work around this issue, restart the worker with unresponsive agent using bosh cck and apply changes from the ops manager.