PKS cluster creation fails with error unable to find network.
When checked on the NSX-T, we see that the logical switch is stuck in progress.
Any new logical switch created will also get stuck in progress state.
Bosh task for cluster creation gives the error similar to:
Task 7678 | 00:29:40 | Preparing deployment: Preparing deployment
Task 7678 | 00:29:55 | Creating missing vms: master/ed902804-7d78-4e04-a8b0-27c3ece35d43 (2)
Task 7678 | 00:29:55 | Creating missing vms: worker/fcb2e858-9744-4834-a7ab-53505e1e0527 (0)
Task 7678 | 00:29:55 | Creating missing vms: worker/a3005421-1df9-4e13-b127-3d86baf35107 (1) (00:00:31)
L Error: Unknown CPI error 'Unknown' with message 'Unable to find network 'pks-3ba416d0-424b-4fba-88c5-ad142e16062b'. Verify that the portgroup exists.' in 'create_vm' CPI method (CPI request ID: 'cpi-422265')
Task 7678 | 00:30:26 | Creating missing vms: worker/fcb2e858-9744-4834-a7ab-53505e1e0527 (0) (00:00:31)
L Error: Unknown CPI error 'Unknown' with message 'Unable to find network 'pks-3ba416d0-424b-4fba-88c5-ad142e16062b'. Verify that the portgroup exists.' in 'create_vm' CPI method (CPI request ID: 'cpi-958580')
Task 7678 | 00:30:27 | Creating missing vms: master/e8c47d85-6472-4e09-8040-f2dfeee783ab (0) (00:00:32)
L Error: Unknown CPI error 'Unknown' with message 'Unable to find network 'pks-3ba416d0-424b-4fba-88c5-ad142e16062b'. Verify that the portgroup exists.' in 'create_vm' CPI method (CPI request ID: 'cpi-140376')
In the /var/log/nsxaVim.log on the ESXi host, you see the log entries similar to:
2019-08-20T02:32:11Z nsxaVim: [17072153]: ERROR Failed to connect to hostd: [{'fault': 'NoPermission', 'faultMessage': [], 'msg': 'Permission to perform this operation was denied.'}]
2019-08-20T02:32:11Z nsxaVim: [17072153]: ERROR [resync] Failed to connect to hostd: [{'fault': 'NoPermission', 'faultMessage': [], 'msg': 'Permission to perform this operation was denied.'}]
2019-08-20T02:32:11Z nsxaVim: [17072153]: INFO resync.py replied error: [failed to connect to hostd: {'fault': 'NoPermission', 'faultMessage': [], 'msg': 'Permission to perform this operation was denied.'}]
2019-08-20T02:32:11Z nsxaVim: [17072153]: INFO [resync] resync.py replied error: [failed to connect to hostd: {'fault': 'NoPermission', 'faultMessage': [], 'msg': 'Permission to perform this operation was denied.'}]
2019-08-20T02:32:33Z nsxaVim: [2102488]: INFO Result msg:[b"error: not ready. reason: error: failed to connect to hostd: {'msg': 'Permission to perform this operation was denied.', 'fault': 'NoPermission', 'faultMessage': []}"]
The issue occurs if there is Lockdown mode enabled on the ESXi hosts.
This is a known issue with VMware NSX-T 2.4.x and will be resolved in future release.
To workaround the issue, disable the Lockdown mode on the ESXi hosts.
Logical switch creation will not succeed on the NSX-T due to permission issue when the host is in Lockdown mode