Unable to upgrade or create new clusters on TKGi/TKGs with NSX-T v3.2.x
book
Article ID: 297311
calendar_today
Updated On:
Products
VMware Tanzu Kubernetes Grid Integrated Edition
Issue/Introduction
Symptoms on tanzu side
Can't create new clusters on TKGi/TKGs
Can't upgrade clusters on TKGi (as it runs an errand deploying a new cluster which fails)
Affected versions:
TKGi: 1.16.x, 1.15.x and 1.14.x
TKGs: vsphere 7.0u3x releases
Symptoms on NSX side:
Existing network objects work. No alarms or red flags anywhere
New Objects are deployed but not realized.
new Logical switches won't have connectivity to their router T1
Issue is seen in any upgrades to NSX is 3.2.x
Under /var/log/proton/nsxapi.log you can see the following entry:
"Failed to re-subscribe [tag:worker_framework].*Last retry failed 20/20 ENDING!"
Trigger for the problem: nsx$SegmentPortInternal table which was present in earlier release with worker_framework stream tag has removed the worker_framework stream tag in 3.2.1 but its old schema definition was still present in corfu after upgrade, so in the logic of subscribing the tables to corfu we pass the tag of worker_framework and corfu fetches all the tables of this stream tag worker_framework and subscribe it but corfu checks for the schema structure of the table which is currently present (old one is still present) not from the actual table's schema.
Environment
Product Version: Other
Resolution
Currently, there is a workaround. Rolling reboot of NSX-T manager cluster fixes the problem for some time, however can come back.