Unable to upgrade VMware PKS from 1.0.2 to 1.1
search cancel

Unable to upgrade VMware PKS from 1.0.2 to 1.1

book

Article ID: 345615

calendar_today

Updated On:

Products

VMware

Issue/Introduction

Symptoms:
  • You are unable to upgrade VMware PKS from 1.0.2 to 1.1

  • When you apply changes from Ops manager, worker node going into unresponsive state and upgrade clusters errand task is failing.

  • From the logs, the agents bootstrapped they could not find the eth0 interface and this broke the agent director communication. In pks-nsx-t release 0.5 (PKS 1.0.2), the jobs remove IPs from eth0 and adds them to the bridge. Bosh agent expects eth0 to be configured with an IP. Due to a race condition when bosh agent start and not able to find this IP the communication fails with the error below.

2018-06-28_17:19:28.24511 [interfaceConfigurationCreator] 2018/06/28 17:19:28 DEBUG - Using static networking 2018-06-28_17:19:28.24533 [File System] 2018/06/28 17:19:28 DEBUG - Stat '/etc/network/interfaces' 2018-06-28_17:19:28.24590 [File System] 2018/06/28 17:19:28 DEBUG - Skipping writing /etc/network/interfaces because contents are identical 2018-06-28_17:19:28.24667 [main] 2018/06/28 17:19:28 ERROR - App setup Running bootstrap: Setting up networking: Validating static network configuration: Validating network interface 'eth0' IP addresses, no interface configured with that name 2018-06-28_17:19:28.24668 [main] 2018/06/28 17:19:28 ERROR - Agent exited with error: Running bootstrap: Setting up networking: Validating static network configuration: Validating network interface 'eth0' IP addresses, no interface configured with that name 2018-06-28_17:19:28.26523 [main] 2018/06/28 17:19:28 DEBUG - Starting agent


Environment

VMware Pivotal Container Service 1.x

Cause

This issue occurs due to a bug present in pks-nsx-t release. This bug was fixed in pks-nsx-t version 0.8 but PKS 1.0.2 tile uses pks-nsx-t release 0.5. 

Resolution


Starting pks-nsx-t release 0.8, the scripts present in pks-nsx-t release assigns IPs to both eth0 and the bridge and configures the routing to use the bridge.

Workaround:

To work around this issue, restart the worker with unresponsive agent using bosh cck and apply changes from the ops manager.