NAPP deployment fails on TKGs/vSphere with Tanzu/vSphere IaaS. The TKR images are not in a Compatible state at first. They become ready/compatible after 20 or so minutes.
NSX
NSX NAPP
vSphere with Tanzu/vSphere IaaS/TKGs
This is due to a known issue that a fix is waiting for release. When setting up the cluster, the supervisor takes longer than expected to reconcile the TKRs used for the NAPP nodes. NAPP encounters a retry timeout before the reconciliation of the TKRs is complete, meaning that NAPP doesn't see the TKRs are ready, stops trying, and then the TKRs become prepared.
The fix is on the vSphere with Tanzu/vSphere IaaS side and should decrease the time it takes for the TKRs to be presented as ready.
As an example, the cvmi_controller has a gap of 20 minutes in the logs of the tkg-controller.
2024-07-08T18:08:26.218874758Z stderr F E0708 18:08:26.218808 1 cvmi_controller.go:219] "error returned from reconcile of cvmi controller" err="error creating/updating TKR 'v1.27.6---vmware.1-fips.1-tkg.1': Operation cannot be fulfilled on tanzukubernetesreleases.run.tanzu.vmware.com \"v1.27.6---vmware.1-fips.1-tkg.1\": the object has been modified; please apply your changes to the latest version and try again" logger="svc-tkg-domain-c2079-tkg-controller.clustervirtualmachineimage-controller" cvmi.name="vmi-141c12bf89fd01291" req="/vmi-141c12bf89fd01291"
2024-07-08T18:08:26.315146497Z stderr F I0708 18:08:26.315072 1 cvmi_controller.go:620] "added osimage ref for legacy" logger="svc-tkg-domain-c2079-tkg-controller.clustervirtualmachineimage-controller" tkr="vmi-e6fda053fe55621ee"
2024-07-08T18:08:27.222597877Z stderr F I0708 18:08:27.222514 1 cvmi_controller.go:620] "added osimage ref for legacy" logger="svc-tkg-domain-c2079-tkg-controller.clustervirtualmachineimage-controller" tkr="vmi-141c12bf89fd01291"
2024-07-08T18:28:26.204302969Z stderr F I0708 18:28:26.204215 1 cvmi_controller.go:231] "reconciling unified tkr" logger="svc-tkg-domain-c2079-tkg-controller.clustervirtualmachineimage-controller" cvmi.name="vmi-05656e6a46161e9c9"
A fix is underway.
The workaround is to attempt the deployment again, after about an hour, and it should succeed. Once the TKRs are fully reconciled by the supervisor, deployment should work as expected.