During Upgrade from TSM or TMC of the Istio installation from v1.18.5 to v1.22.2 the upgrade is initiated but after a while it fails and roll back is started
Tanzu on vSphere clusters
TSM Tanzu Service Mesh
TMC Tanzu Mission Control
Jump upgrade from 1.18.x to 1.22.2 have several operations completed during the process
for each stage there is a separate task executed
There were two issues discovered during the upgrade operations that was taking place:
1. Pod disruption budget configured on telemetry preventing the restart of the telemetry pods
2. During second phase where all proxies configured in the cluster (enabled namespaces) have to be restarted
Due to validating webhook this process was denied and the proxy restart was failing - leading to a rollback operation stated in the UI "Robbling back mesh dependencies"
Other reasons could be related to specific configuration or stuck objects but such were not found during analisys
To resolve this problem we have:
1. PDB issue - Save the PDB and delete the PDB during the upgrade procedure completes
2. Validating webhook (gatekeeper) disable the validating webhook during the upgrade process to allow restart if Proxy pods.
Applying these two changes allowed up to complete the upgrade.