Note: The procedure described here uses Tanzu Kubernetes Grid Integrated Edition (TKGI) 1.8.0 and includes reverting back the stemcell from 621.76 to 621.75 as an example. However, these instructions can be generically used to revert stemcell versions across any TKGI version. Make the appropriate changes to your use case. Make sure the stemcell version that you choose to revert to is the last working stemcell version in your foundation and is supported as per this KB Article.
In certain scenarios, incompatibility between TKGI and a stemcell version causes openvswitch (OVS) compilation to fail. However, there may be a scenario where a user can accidentally upgrade the stemcell to an incompatible stemcell version, as this is not restricted by the product. These scenarios include automation pipelines picking up these changes or a user error. The wrong stemcell version won’t have any impact on the running clusters but the foundation will lose the functionality to create new clusters.
When using the wrong version, the OVS compilation fails with the errors below.
Note: This is one example of a openvswitch compilation failure, some of the error messages may differ in your foundation.
Task 212 | 22:45:51 | Compiling packages: openvswitch/755dded7e07d7502bf72bd204d3b22b52b9fc28b (00:01:46) L Error: Action Failed get_task: Task 20766cb2-269b-47db-4127-e0ba46fa4687 result: Compiling package openvswitch: Running packaging script: Running packaging script: Command exited with 2; Truncated stdout: PYTHONPATH=./python":"$PYTHONPATH PYTHONDONTWRITEBYTECODE=yes /var/vcap/packages/nsx-python27/bin/python2 ./ovsdb/ovsdb-idlc.in c-idl-header vtep/vtep-idl.ovsidl > vtep/vtep-idl.h.tmp && mv vtep/vtep-idl.h.tmp vtep/vtep-idl.h PYTHONPATH=./python":"$PYTHONPATH PYTHONDONTWRITEBYTECODE=yes /var/vcap/packages/nsx-python27/bin/python2 ./ovsdb/ovsdb-idlc.in annotate ./ovn/ovn-sb.ovsschema ./ovn/lib/ovn-sb-idl.ann > ovn/lib/ovn-sb-idl.ovsidl.tmp && \ mv ovn/lib/ovn-sb-idl.ovsidl.tmp ovn/lib/ovn-sb-idl.ovsidl PYTHONPATH=./python":"$PYTHONPATH PYTHONDONTWRITEBYTECODE=yes /var/vcap/packages/nsx-python27/bin/python2 ./ovsdb/ovsdb-idlc.in c-idl-source ovn/lib/ovn-sb-idl.ovsidl > ovn/lib/ovn-sb-idl.c.tmp && mv ovn/lib/ovn-sb-idl.c.tmp ovn/lib/ovn-sb-idl.c PYTHONPATH=./python":"$PYTHONPATH PYTHONDONTWRITEBYTECODE=yes /var/vcap/packages/nsx-python27/bin/python2 ./ovsdb/ovsdb-idlc.in c-idl-header ovn/lib/ovn-sb-idl.ovsidl > ovn/lib/ovn-sb-idl.h.tmp && mv ovn/lib/ovn-sb-idl.h.tmp ovn/lib/ovn-sb-idl.h PYTHONPATH=./python":"$PYTHONPATH PYTHONDONTWRITEBYTECODE=yes /var/vcap/packages/nsx-python27/bin/python2 ./ovsdb/ovsdb-idlc.in annotate ./ovn/ovn-nb.ovsschema ./ovn/lib/ovn-nb-idl.ann > ovn/lib/ovn-nb-idl.ovsidl.tmp && \ mv ovn/lib/ovn-nb-idl.ovsidl.tmp ovn/lib/ovn-nb-idl.ovsidl PYTHONPATH=./python":"$PYTHONPATH PYTHONDONTWRITEBYTECODE=yes /var/vcap/packages/nsx-python27/bin/python2 ./ovsdb/ovsdb-idlc.in c-idl-source ovn/lib/ovn-nb-idl.ovsidl > ovn/lib/ovn-nb-idl.c.tmp && mv ovn/lib/ovn-nb-idl.c.tmp ovn/lib/ovn-nb-idl.c PYTHONPATH=./python":"$PYTHONPATH PYTHONDONTWRITEBYTECODE=yes /var/vcap/packages/nsx-python27/bin/python2 ./ovsdb/ovsdb-idlc.in c-idl-header ovn/lib/ovn-nb-idl.ovsidl > ovn/lib/ovn-nb-idl.h.tmp && mv ovn/lib/ovn-nb-idl.h.tmp ovn/lib/ovn-nb-idl.h make all-recursive make[1]: Entering directory '/var/vcap/data/compile/openvswitch/openvswitch-2.12.1.16098734' Making all in datapath make[2]: Entering directory '/var/vcap/data/compile/openvswitch/openvswitch-2.12.1.16098734/datapath' Making all in linux make[3]: Entering directory '/var/vcap/data/compile/openvswitch/openvswitch-2.12.1.16098734/datapath/linux' ------------------------Truncated for brevity---------------------------- configure: WARNING: cannot find libcap-ng. --user option will not be supported on Linux. (you may use --disable-libcapng to suppress this warning). configure: WARNING: Missing Python six library. + make /var/vcap/data/compile/openvswitch/openvswitch-2.12.1.16098734/datapath/linux/geneve.c: In function 'geneve_get_v6_dst': /var/vcap/data/compile/openvswitch/openvswitch-2.12.1.16098734/datapath/linux/geneve.c:966:15: error: 'const struct ipv6_stub' has no member named 'ipv6_dst_lookup' if (ipv6_stub->ipv6_dst_lookup(geneve->net, gs6->sock->sk, &dst, fl6)) { ^ make[5]: *** [/var/vcap/data/compile/openvswitch/openvswitch-2.12.1.16098734/datapath/linux/geneve.o] Error 1 make[4]: *** [_module_/var/vcap/data/compile/openvswitch/openvswitch-2.12.1.16098734/datapath/linux] Error 2 make[3]: *** [default] Error 2 make[2]: *** [all-recursive] Error 1 make[1]: *** [all-recursive] Error 1 make: *** [all] Error 2
Product Version: 1.8+
OS: Ubuntu
The upgrade fails while compiling the OVS package so there is no impact on the K8s clusters. Neither the K8s control plane nor the data plane is impacted.
However, no new clusters can be created in this foundation. To recover from this scenario, follow the steps below:
1. Verify both stemcells are in use:
bosh stemcells Using environment '172.###.###.2' as client 'ops_manager' Name Version OS CPI CID bosh-vsphere-esxi-ubuntu-xenial-go_agent 621.76* ubuntu-xenial b6763c44e8280ab9537b sc-7b508cbd-dbbf-4b67-bfe8-6426c641ae0d ~ 621.75* ubuntu-xenial b6763c44e8280ab9537b sc-24d96bec-79b3-414e-b6cd-aa10b3e3aa60 ~ 456.114* ubuntu-xenial b6763c44e8280ab9537b sc-5ac098d6-87ca-4999-b4fb-1c4ae0d74b05 bosh-vsphere-esxi-windows2019-go_agent 2019.23 windows2019 b6763c44e8280ab9537b sc-052ee366-6514-408a-b069-8d4d8e0279c5 (*) Currently deployed
2. Retrieve the access token to interact with Ops Manager API:
ubuntu@opsman:~$ uaac target https://opsman.domain.com/uaa --skip-ssl-validation Unknown key: Max-Age = 86400 Target: https://opsman.domain.com/uaa ubuntu@opsman:~$ uaac token owner get Client ID: opsman Client secret: Unknown key: Max-Age = 86400 User name: admin Password: ******** Unknown key: Max-Age = 172800 Successfully fetched token via owner password grant. Target: https://opsman.domain.com/uaa Context: admin, from client opsman
3. Upload the stemcell just in case it's not present in Ops Manager:
export access_token="<token_from_step_above>" curl "https://opsman.domain.com/api/v0/stemcells" -X POST -H "Authorization: Bearer $access_token" -F 'stemcell[file][email protected]_agent.tgz' -F 'stemcell[floating]=false'
4. Decrypt installation.yml and modify it to replace all occurrences of 621.76:
# Backup the installation.yml ubuntu@opsman:/var/tempest/workspaces/default$ cp /var/tempest/workspaces/default/installation.yml /var/tempest/workspaces/default/installation.yml.backup # Decrypt the installation.yml ubuntu@opsman:/var/tempest/workspaces/default/metadata$ cd /var/tempest/workspaces/default ubuntu@opsman:/var/tempest/workspaces/default$ sudo -u tempest-web SECRET_KEY_BASE="s" RAILS_ENV=production /home/tempest-web/tempest/web/scripts/decrypt /var/tempest/workspaces/default/installation.yml /tmp/installation.yml # Replace all 621.76 occurrence with 621.75 /var/tempest/workspaces/default$ sudo vim /tmp/installation.yml # Encypt the installation.yml /var/tempest/workspaces/default$ sudo -u tempest-web SECRET_KEY_BASE="s" RAILS_ENV=production /home/tempest-web/tempest/web/scripts/encrypt /tmp/installation.yml /var/tempest/workspaces/default/installation.yml
5. In the stemcell library, the stemcell should now be reverted to the older version as highlighted in the screenshot below:
6. On the Ops Manager dashboard, click Review Changes. You should see the version change now being reflected.
7. Lastly, Apply Changes with the upgrade all clusters errand enabled. Once the Apply Changes is complete, stemcell 621.76 can be deleted from BOSH if it is not used by any other tile.