Deploy vCenter and SDDC Manager phase takes a long time or fails during VCF bring-up when NFS is configured as principal storage with storage traffic separation
search cancel

Deploy vCenter and SDDC Manager phase takes a long time or fails during VCF bring-up when NFS is configured as principal storage with storage traffic separation

book

Article ID: 441549

calendar_today

Updated On:

Products

VMware Cloud Foundation

Issue/Introduction

During VMware Cloud Foundation bring-up, the Deploy vCenter and SDDC Manager phase takes an unexpectedly long time or fails.
This issue typically occurs under the following conditions:

  • NFS v3 is selected as the principal storage type.
  • A vSphere Distributed Switch (VDS) profile that separates storage traffic (e.g., "Storage Traffic Separation") is selected.

The "Configure Base Install Image Repository in SDDC Manager" task may fail with the following error in the SDDC Manager UI:

Failed to copy /nfs/vmware/vcf/nfs-mount/base-install-images/ to base install mount point

The /var/log/vmware/vcf/domainmanager/domainmanager.log file contains entries similar to:

{Timestamp} ERROR [vcf_dm,0000000000000000,0000] [c.v.v.v.f.a.ConfigureBaseImageRepoAction,dm-exec-2291]  Failed to copy folder from /nfs/vmware/vcf/nfs-mount/bundle/aa88f811-700a-5384-b86e-c40191985348/aa88f811-700a-5384-b86e-c40191985348 to path /nfs/vmware/vcf/nfs-mount/base-install-images/vsp_folder with exception 
com.vmware.vcf.secure.ssh.errors.VcfSshException: Failed to upload file /nfs/vmware/vcf/nfs-mount/base-install-images/vsp_folder/vmsp-platform-9.1.0.0.25370367.tar
        at com.vmware.vcf.secure.ssh.SshExecuter.upload(SshExecuter.java:293)
        at com.vmware.vcf.secure.ssh.SshExecuter.uploadFolder(SshExecuter.java:353)
        at com.vmware.vcf.vimanager.fsm.actions.ConfigureBaseImageRepoAction.lambda$copyFolder$6(ConfigureBaseImageRepoAction.java:793)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
        at java.base/java.lang.Thread.run(Thread.java:1583)
Caused by: com.vmware.vcf.secure.ssh.common.SshClientException: sddcm.vcf.internal: Failed to upload file to /nfs/vmware/vcf/nfs-mount/base-install-images/vsp_folder/vmsp-platform-9.1.0.0.25370367.tar via ssh
        at com.vmware.vcf.secure.ssh.common.SshClientImpl.upload(SshClientImpl.java:300)
        at com.vmware.vcf.secure.ssh.SshExecuter.upload(SshExecuter.java:291)
        ... 6 common frames omitted
Caused by: org.apache.sshd.common.SshException: IoWriteFutureImpl[SftpChannelSubsystem[id=10, recipient=4]-ClientSessionImpl[vcf@{SDDC Manager FQDN}/{SDDC Manager IP}:22][sftp][SSH_MSG_CHANNEL_DATA]]: Failed to get operation result within specified timeout: 30000 msec
        at org.apache.sshd.common.future.AbstractSshFuture.lambda$verifyResult$1(AbstractSshFuture.java:114)
        at org.apache.sshd.common.future.AbstractSshFuture.formatExceptionMessage(AbstractSshFuture.java:206)
        at org.apache.sshd.common.future.AbstractSshFuture.verifyResult(AbstractSshFuture.java:114)
        at org.apache.sshd.common.io.AbstractIoWriteFuture.verify(AbstractIoWriteFuture.java:41)
        at org.apache.sshd.common.io.AbstractIoWriteFuture.verify(AbstractIoWriteFuture.java:32)
        at org.apache.sshd.common.future.VerifiableFuture.verify(VerifiableFuture.java:110)
        at org.apache.sshd.common.future.VerifiableFuture.verify(VerifiableFuture.java:96)
        at org.apache.sshd.sftp.client.SftpMessage.waitUntilSent(SftpMessage.java:85)
        at org.apache.sshd.sftp.client.impl.SftpOutputStreamAsync.internalFlush(SftpOutputStreamAsync.java:358)
        at org.apache.sshd.sftp.client.impl.SftpOutputStreamAsync.internalTransfer(SftpOutputStreamAsync.java:285)
        at org.apache.sshd.sftp.client.impl.SftpOutputStreamAsync.transferFrom(SftpOutputStreamAsync.java:184)
        at org.apache.sshd.sftp.client.impl.AbstractSftpClient.put(AbstractSftpClient.java:1257)
        at org.apache.sshd.sftp.client.SftpClient.put(SftpClient.java:972)
        at com.vmware.vcf.secure.ssh.common.SshClientImpl.upload(SshClientImpl.java:297)
        ... 7 common frames omitted
Caused by: java.util.concurrent.TimeoutException: Timed out after 30000 msec
        at org.apache.sshd.common.future.AbstractSshFuture.verifyResult(AbstractSshFuture.java:113)
        ... 18 common frames omitted

Running esxtop shows a high kernel average latency (KAVG) (e.g., >1000ms) for the NFS datastore.

When checking the ESXi Host Client during the deployment, the temporary vSwitch created for NFS traffic shows that a traffic shaping policy is active with a strict bandwidth limit:

  1. Log in to the ESXi Host Client.
  2. Navigate to Networking > Virtual Switches.
  3. Click EDIT on the temporary vSwitch created for NFS traffic.
  4. Under the Traffic shaping settings, Status is Enabled and the bandwidth is limited to 100 Mbit/s.

 

 

Cause

During the vCenter and SDDC Manager deployment, a traffic shaping policy is mistakenly enabled on the temporary vSwitch created for NFS traffic.
This policy restricts the bandwidth to 100 Mbit/s, causing the large file transfers (such as .tar bundles) to take an excessively long time or time out completely.

Resolution

To workaround this issue, disable the traffic shaping policy on the affected vSwitch and retry the task.

  1. Log in to the ESXi Host Client of the deployment host.
  2. Navigate to Networking > Virtual Switches.
  3. Select the temporary vSwitch created for the NFS traffic and click EDIT.
  4. Navigate to the Traffic shaping section.
  5. Change the Status to Disabled and save it.
  6. The deployment can be retried from the last failed task.