Unable to expand the disk in a vSAN cluster.
search cancel

Unable to expand the disk in a vSAN cluster.

book

Article ID: 396960

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

Symptoms:

  • Expand disk from vcenter fails with error : The disk extend operation failed: 11 (Resource temporarily unavailable)

Environment

VMware vSAN 6.x
VMware vSAN 7.x
VMware vSAN 8.x

Cause

vSAN objects will be in "reduced-availability-with-no-rebuild" state due to MTU misconfiguration on the physical switch.

Cause validation:

  • In the var/run/log/hostd.log file, similar entries are seen:
    YYYY-MM-DDTHH:MM.SSSZ Er(163) Hostd[2099058]: [Originator@6876 sub=DiskLib opID=m3xtp1vn-4895167-auto-2wx4w-h5:71277248-8b-01-49-fd00 sid=52f5736a user=vpxuser:######.#####\##########] DISKLIB-LIB   : Failed to grow disk '/vmfs/volumes/vsan:################-################/OBJECTUUID/VM_3.vmdk' : Resource temporarily unavailable (720905).
    YYYY-MM-DDTHH:MM.SSSZ Db(167) Hostd[2099058]: [Originator@6876 sub=Vigor.Vmsvc.vm:/vmfs/volumes/vsan:################-################/OBJECTUUID/VM
    icate_DR_restored.vmx opID=m3xtp1vn-4895167-auto-2wx4w-h5:71277248-8b-01-49-fd00 sid=52f5736a user=vpxuser:######.#####\##########] Extend disk message: The disk extend operation failed: Resource temporarily unavailable
    YYYY-MM-DDTHH:MM.SSSZ In(166) Hostd[2099058]: [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/vsan:################-################/OBJECTUUID/VM.vmx opID=m3xtp1vn-4895167-auto-2wx4w-h5:71277248-8b-01-49-fd00 sid=52f5736a user=vpxuser:######.#####\##########] Reconfigure failed: N3Vim5Fault20GenericVmConfigFault9ExceptionE(Fault cause: vim.fault.GenericVmConfigFault

  • Run the command "esxcli vsan debug object list -u <object UUID>" to validate the object health.

    esxcli vsan debug object list -u <object UUID>
    Object UUID: ########-####-####-####-############
       Version: 20
       Health: reduced-availability-with-no-rebuild
       Owner: #######
       Size: 255.00 GB
       Used: 2.72 GB
       Used 4K Blocks: 2.32 GB
       Policy:
          stripeWidth: 1
          cacheReservation: 0
          proportionalCapacity: [0, 100]
          hostFailuresToTolerate: 1
          forceProvisioning: 0
          spbmProfileId: ########-####-####-####-############
          spbmProfileGenerationNumber: 0
          replicaPreference: Performance
          iopsLimit: 0
          checksumDisabled: 0
          subFailuresToTolerate: 1
          CSN: 1271
          spbmProfileName: BTVvSAN-Stretched
          locality: None

       Configuration:

          Component: ########-####-####-####-############
            Component State: ACTIVE,  Address Space(B): 273804165120 (255.00GB),  Disk UUID: ########-####-####-####-############,  Disk Name: naa.################:2
            Votes: 1,  Capacity Used(B): 2948595712 (2.75GB),  Physical Capacity Used(B): 2919235584 (2.72GB),  Total 4K Blocks Used(B): 2487750656 (2.32GB),  Host Name: #######

       Type: vmnamespace
       Path: /vmfs/volumes/vsan:################-################/VM (Exists)
       Group UUID: ########-####-####-####-############
       Directory Name: VM

    From the above output, it is confirmed that the object is in "reduced-availability-with-no-rebuild" state.

  • Ran the command "esxcli vsan debug object health summary get" to validate the health of all the objects in the cluster.

    esxcli vsan debug object health summary get
    Health Status                                              Number Of Objects
    ---------------------------------------------------------  -----------------
    remoteAccessible                                                           0
    inaccessible                                                               0
    reduced-availability-with-no-rebuild                                   10391
    reduced-availability-with-no-rebuild-delay-timer                           0
    reducedavailabilitywithpolicypending                                       0
    reducedavailabilitywithpolicypendingfailed                                 0
    reduced-availability-with-active-rebuild                                   0
    reducedavailabilitywithpausedrebuild                                       0
    data-move                                                                  0
    nonavailability-related-reconfig                                           0
    nonavailabilityrelatedincompliancewithpolicypending                        0
    nonavailabilityrelatedincompliancewithpolicypendingfailed                  0
    nonavailability-related-incompliance                                       0
    nonavailabilityrelatedincompliancewithpausedrebuild                        0
    healthy                                                                    0

    From the above output, it is confirmed that all the vsan objects are in "reduced-availability-with-no-rebuild" state.
  • Vsan skyline health reports error for "vSAN: MTU check (ping with large packet size)".

  • Click on the option "View Current Result" to identify the host whic is unable to communicate with the data nodes in the cluster..

  • Run the command "esxcli vsan network list" to identify the VMK used for vSAN traffic.

    esxcli vsan network list
    Interface
    VmkNic Name: vmk1
    IP Protocol: IP
    Interface UUID: ########-####-####-####-############
    Agent Group Multicast Address: ####.#.#.#
    Agent Group IPv6 Multicast Address: ####: :#:#:#
    Agent Group Multicast Port: #########
    Master Group Multicast Address: ####.#.#.#
    Master Group IPv6 Multicast Address: ####: :#:#:#
    Master Group Multicast Port: ######
    Host Unicast Channel Bound Port: #####
    Data-in-Transit Encryption Key Exchange Port: 0
    Multicast TTL: 5
    Traffic Type: vsan

    In the above example, it is confirmed that vmk1 is used for vSAN traffic.

  • Run the command "esxcfg-vswitch -l" to identify the vSwitch used for vSAN traffic and check the MTU configured on it.

    esxcfg-vswitch -l

    DVS Name                   Num Ports    Used Ports      Configured Ports MTU      MTU
    Switch name                  2520            10              512                 9000

    DVPort ID                                                 In Use                 Client
    512                                                         1                    vmnicl         
    513                                                         1                    vmnic0
    514                                                         0                 
    515                                                         0                    
    0                                                           1                    vmk0
    128                                                         1                    vmk1
    256                                                         1                    vmk2


    In the above example, it is confirmed that vmnic1 and vmnic2 are used for vsan communication. vSwicth is configured with 9000 MTU. 

  • Run the command "esxcfg-vmknic -l" to verify the MTU set on the VMkernel adapter (vmk).

    esxcfg-vmknic -l
    vmk1               128                            IPv4                                                     9000
    65535              true   STATIC                 DefaultTCPIPStack

    In the above example, it is confirmed that vmk1 is configured with MTU 9000.

  • Run the command "esxcfg-nics -l" to confirm the MTU configured on the physical nics (vmnics).

    esxcfg-nics -l
    Name             PCI                                 Driver    Link Speed     Duplex     MAC Address                  MTU     Description
    vmnico        ####: ##: ##: # vmxnet       Up       10000Mbps     Full          ##:##:##:##:##:##:####               9000
    vmnicl        ####: ##: ##: # vmxnet       Up       10000Mbps     Full          ##:##:##:##:##:##:####               9000

    In the above example, it is confirmed that vmnics are configured with MTU 9000.

    Repeat the above procedure for all the hosts in the cluster and make sure the MTU should be consistent across the network

  • Ping the faulty host from a working host using a 9000 MTU.

    vmkping -I vmkx -d -s  8972 <IP adress of faulty node>"
    PING ##.##.###.## ( ##.##.###.##): 8972 data bytes

    ---  ##.##.###.## ping statistics ---
    3 packets transmitted, 0 packets received, 100% packet loss

  • Ping the faulty host from a working host using a 1500 MTU.
    vmkping -I vmkx -d -s  1472 <IP adress of faulty node>"

    PING ##.##.###.## (##.##.###.##): 1472 data bytes
    1480 bytes from ##.##.###.##: icmp_seq=0 ttl=64 time=0.118 ms
    1480 bytes from ##.##.###.##: icmp_seq=1 ttl=64 time=0.116 ms
    1480 bytes from ##.##.###.##: icmp_seq=2 ttl=64 time=0.106 ms

    --- ##.##.###.## ping statistics ---
    3 packets transmitted, 3 packets received, 0% packet loss
    round-trip min/avg/max = 0.106/0.113/0.118 ms

    Based on the results from the steps above, it is confirmed that the healthy host is unable to communicate with the faulty host over vSAN traffic when the MTU is set to 9000.

Resolution

Engage the switch/network vendor to verify the MTU settings on all external network components and ensure they are configured to 9000.