How to revert a stemcell in Tanzu Kubernetes Grid Integrated Edition to prevent openvswitch compilation issues
search cancel

How to revert a stemcell in Tanzu Kubernetes Grid Integrated Edition to prevent openvswitch compilation issues


Article ID: 298731


Updated On:


VMware Tanzu Kubernetes Grid Integrated Edition


Note: The procedure described here uses Tanzu Kubernetes Grid Integrated Edition (TKGI) 1.8.0 and includes reverting back the stemcell from 621.76 to 621.75 as an example. However, these instructions can be generically used to revert stemcell versions across any TKGI version. Make the appropriate changes to your use case. Make sure the stemcell version that you choose to revert to is the last working stemcell version in your foundation and is supported as per Tanzu Network.

In certain scenarios, incompatibility between TKGI and a stemcell version causes openvswitch (OVS) compilation to fail. However, there may be a scenario where a customer can accidentally upgrade the stemcell to an incompatible stemcell version, as this is not restricted by the product. These scenarios include automation pipelines picking up these changes or a user error. The wrong stemcell version won’t have any impact on the running clusters but the foundation will lose the functionality to create new clusters

When using the wrong version, the OVS compilation fails with the errors below.

Note: This is one example of a openvswitch compilation failure, some of the error messages may differ in your foundation.

Task 212 | 22:45:51 | Compiling packages: openvswitch/755dded7e07d7502bf72bd204d3b22b52b9fc28b (00:01:46)
                    L Error: Action Failed get_task: Task 20766cb2-269b-47db-4127-e0ba46fa4687 result: Compiling package openvswitch: Running packaging script: Running packaging script: Command exited with 2; Truncated stdout: PYTHONPATH=./python":"$PYTHONPATH PYTHONDONTWRITEBYTECODE=yes /var/vcap/packages/nsx-python27/bin/python2 ./ovsdb/ c-idl-header vtep/vtep-idl.ovsidl > vtep/vtep-idl.h.tmp && mv vtep/vtep-idl.h.tmp vtep/vtep-idl.h
PYTHONPATH=./python":"$PYTHONPATH PYTHONDONTWRITEBYTECODE=yes /var/vcap/packages/nsx-python27/bin/python2 ./ovsdb/ annotate ./ovn/ovn-sb.ovsschema ./ovn/lib/ovn-sb-idl.ann > ovn/lib/ovn-sb-idl.ovsidl.tmp && \
mv ovn/lib/ovn-sb-idl.ovsidl.tmp ovn/lib/ovn-sb-idl.ovsidl
PYTHONPATH=./python":"$PYTHONPATH PYTHONDONTWRITEBYTECODE=yes /var/vcap/packages/nsx-python27/bin/python2 ./ovsdb/ c-idl-source ovn/lib/ovn-sb-idl.ovsidl > ovn/lib/ovn-sb-idl.c.tmp && mv ovn/lib/ovn-sb-idl.c.tmp ovn/lib/ovn-sb-idl.c
PYTHONPATH=./python":"$PYTHONPATH PYTHONDONTWRITEBYTECODE=yes /var/vcap/packages/nsx-python27/bin/python2 ./ovsdb/ c-idl-header ovn/lib/ovn-sb-idl.ovsidl > ovn/lib/ovn-sb-idl.h.tmp && mv ovn/lib/ovn-sb-idl.h.tmp ovn/lib/ovn-sb-idl.h
PYTHONPATH=./python":"$PYTHONPATH PYTHONDONTWRITEBYTECODE=yes /var/vcap/packages/nsx-python27/bin/python2 ./ovsdb/ annotate ./ovn/ovn-nb.ovsschema ./ovn/lib/ovn-nb-idl.ann > ovn/lib/ovn-nb-idl.ovsidl.tmp && \
mv ovn/lib/ovn-nb-idl.ovsidl.tmp ovn/lib/ovn-nb-idl.ovsidl
PYTHONPATH=./python":"$PYTHONPATH PYTHONDONTWRITEBYTECODE=yes /var/vcap/packages/nsx-python27/bin/python2 ./ovsdb/ c-idl-source ovn/lib/ovn-nb-idl.ovsidl > ovn/lib/ovn-nb-idl.c.tmp && mv ovn/lib/ovn-nb-idl.c.tmp ovn/lib/ovn-nb-idl.c
PYTHONPATH=./python":"$PYTHONPATH PYTHONDONTWRITEBYTECODE=yes /var/vcap/packages/nsx-python27/bin/python2 ./ovsdb/ c-idl-header ovn/lib/ovn-nb-idl.ovsidl > ovn/lib/ovn-nb-idl.h.tmp && mv ovn/lib/ovn-nb-idl.h.tmp ovn/lib/ovn-nb-idl.h
make  all-recursive
make[1]: Entering directory '/var/vcap/data/compile/openvswitch/openvswitch-'
Making all in datapath
make[2]: Entering directory '/var/vcap/data/compile/openvswitch/openvswitch-'
Making all in linux
make[3]: Entering directory '/var/vcap/data/compile/openvswitch/openvswitch-'
------------------------Truncated for brevity----------------------------
configure: WARNING: cannot find libcap-ng.
--user option will not be supported on Linux.
(you may use --disable-libcapng to suppress this warning).
configure: WARNING: Missing Python six library.
+ make
/var/vcap/data/compile/openvswitch/openvswitch- In function 'geneve_get_v6_dst':
/var/vcap/data/compile/openvswitch/openvswitch- error: 'const struct ipv6_stub' has no member named 'ipv6_dst_lookup'
  if (ipv6_stub->ipv6_dst_lookup(geneve->net, gs6->sock->sk, &dst, fl6)) {
make[5]: *** [/var/vcap/data/compile/openvswitch/openvswitch-] Error 1
make[4]: *** [_module_/var/vcap/data/compile/openvswitch/openvswitch-] Error 2
make[3]: *** [default] Error 2
make[2]: *** [all-recursive] Error 1
make[1]: *** [all-recursive] Error 1
make: *** [all] Error 2 


Product Version: 1.8
OS: Ubuntu


The upgrade fails while compiling the OVS package so there is no impact on the K8s clusters. Neither the K8s control plane nor the data plane is impacted.

However, no new clusters can be created in this foundation. To recover from this scenario, follow the steps below:

1. Verify both stemcells are in use:
bosh stemcells
Using environment '' as client 'ops_manager'
Name                                      Version   OS             CPI                   CID
bosh-vsphere-esxi-ubuntu-xenial-go_agent  621.76*   ubuntu-xenial  b6763c44e8280ab9537b  sc-7b508cbd-dbbf-4b67-bfe8-6426c641ae0d
~                                         621.75*   ubuntu-xenial  b6763c44e8280ab9537b  sc-24d96bec-79b3-414e-b6cd-aa10b3e3aa60
~                                         456.114*  ubuntu-xenial  b6763c44e8280ab9537b  sc-5ac098d6-87ca-4999-b4fb-1c4ae0d74b05
bosh-vsphere-esxi-windows2019-go_agent    2019.23   windows2019    b6763c44e8280ab9537b  sc-052ee366-6514-408a-b069-8d4d8e0279c5
(*) Currently deployed

2. Retrieve the access token to interact with Ops Manager API:
ubuntu@opsman:~$ uaac target https://opsman.corp.local/uaa --skip-ssl-validation
Unknown key: Max-Age = 86400
Target: https://opsman.corp.local/uaa
ubuntu@opsman:~$ uaac token owner get
Client ID:  opsman
Client secret:
Unknown key: Max-Age = 86400
User name:  admin
Password:  ********
Unknown key: Max-Age = 172800
Successfully fetched token via owner password grant.
Target: https://opsman.corp.local/uaa
Context: admin, from client opsman

3. Upload the stemcell just in case it's not present in Ops Manager:

export access_token="<token_from_step_above>"
curl "https://opsman.corp.local/api/v0/stemcells"  -X POST -H "Authorization: Bearer $access_token" -F 'stemcell[file][email protected]_agent.tgz' -F 'stemcell[floating]=false'

4. Decrypt installation.yml and modify it to replace all occurrences of 621.76:

# Backup the installation.yml
ubuntu@opsman:/var/tempest/workspaces/default$ cp /var/tempest/workspaces/default/installation.yml /var/tempest/workspaces/default/installation.yml.backup

# Decrypt the installation.yml
ubuntu@opsman:/var/tempest/workspaces/default/metadata$ cd /var/tempest/workspaces/default
ubuntu@opsman:/var/tempest/workspaces/default$ sudo -u tempest-web SECRET_KEY_BASE="s" RAILS_ENV=production /home/tempest-web/tempest/web/scripts/decrypt /var/tempest/workspaces/default/installation.yml /tmp/installation.yml
# Replace all 621.76 occurrence with 621.75
/var/tempest/workspaces/default$ sudo vim /tmp/installation.yml
# Encypt the installation.yml
/var/tempest/workspaces/default$ sudo -u tempest-web SECRET_KEY_BASE="s" RAILS_ENV=production /home/tempest-web/tempest/web/scripts/encrypt /tmp/installation.yml /var/tempest/workspaces/default/installation.yml

5. In the stemcell library, the stemcell should now be reverted to the older version as highlighted in the screenshot below:

6. On the Ops Manager dashboard, click Review Changes. You should see the version change now being reflected.

7. Lastly, Apply Changes with the upgrade all clusters errand enabled. Once the Apply Changes is complete, stemcell 621.76 can be deleted from BOSH if it is not used by any other tile.