Walkthrough of how HCX behaves with a virtual machine that has MON enable and faces storage issues
search cancel

Walkthrough of how HCX behaves with a virtual machine that has MON enable and faces storage issues

book

Article ID: 422516

calendar_today

Updated On:

Products

VMware HCX

Issue/Introduction

  • The datastore hosting the virtual machine is experiencing storage issues. Logs can be found in the ESXi host at the location /var/log/vmkernel:
    <timestamp> Wa(180) vmkwarning: cpu79:2097377)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:235: NMP device "naa.################################" state in doubt; requested fast path state update...

 

  • The VMtools stop reporting to the hypervisor. Logs can be found in the ESXi host at the VM location /vmfs/volumes/datastore/vm_name/:
    <timestamp> In(05) vmx - GuestRpcSendTimedOut: message to toolbox timed out.
    <timestamp> In(05) vmx - Tools: [AppStatus] Last heartbeat value 1265806 (last received 4s ago)

 

  • The HCX Manager constantly monitors the VMs once MON is enabled to update the NSX references accordingly. In this case, the VM is seen by the HCX Manager as unreachable, and it was not done by VMotion. Logs can be found in the HCX Manager at the location /common/logs/admin/app.log:
    <timestamp> UTC [NetworkStretchService_SvcThread-30742, j: f74076f2, , TxId: ########-####-####-####-############] INFO  c.v.v.h.n.HandleVMUpdateJob- Running in state BEGIN for VM vm-####### ...
    <timestamp> UTC [NetworkStretchService_SvcThread-30742, j: f74076f2, vm: vm-######, PR, TxId: ########-####-####-####-############] INFO  c.v.v.h.n.HandleVMUpdateJob- Not updating IP address for vm-###### in deletionSetDampeningMap as vmotion is not detected

 

  • Then HCX identifies that the IP is not reachable, or VM Tools is not responding, or is unable to identify the IP. The following log can be found in HCX Manager at the location /common/logs/admin/app.log:
    <timestamp> [NetworkStretchService_SvcThread-30742, j: ########, vm: vm-######, vnic: 0, portKey: ########-####-####-####-############, PR, TxId: ########-####-####-####-############] INFO  c.v.h.n.util.NetworkExtensionUtils- setTaskInVMStretchContext - vcInstanceId ########-####-####-####-############ vmMoref vm-###### taskWithIntent {"intent":{"vmId":"vm-######","resourceId":"########-####-####-####-############","switchoverType":"switchoverNow","routerEndpointId":"#################-########-####-####-####-############","networkExtensionId":""},"status":"RUNNING","taskId":"########-####-####-####-############","message":"Configuring VM to use Remote Router as relevant Ip address is not present on VM or not detected by VMtools.","progress":60}

     

  • The HCX Manager disables the MON in the affected virtual machine, as the IP is not reachable:
    <timestamp> UTC [NetworkStretchService_SvcThread-30737, j: 102aab09, vm: vm-#####, nicIndex: 0, PR, TxId: #########-####-####-####-########] INFO  c.v.h.a.n.NsxTransformersAdapter- For segment port default:#########-####-####-####-########, setting extra config key com.vmware.nsx.port.extraConfig.remoteRtr -> 'null'. Old value was '<IP-Address> <MAC-Address> <MAC-Address> <MAC-Address> LE'. Admin state - old: UP, new: DOWN. <----- The "LE" is removed during the port's restartation. Which means that the MON is disabled
    <timestamp> UTC [NetworkStretchService_SvcThread-30737, j: 102aab09, vm: vm-#####, nicIndex: 0, PR, TxId: #########-####-####-####-########] INFO  c.v.h.a.n.NsxTransformersAdapter- For segment port default:#########-####-####-####-########, setting extra config key com.vmware.nsx.port.extraConfig.remoteRtr -> '<IP-Address> <MAC-Address> <MAC-Address> <MAC-Address>'. Old value was 'null'. Admin state - old: DOWN, new: UP.

Cause

As part of the workflow, the HCX Manager continuously monitors the virtual machines that have MON enabled through VMtools RPC calls. If the VMtools do not respond with the VM's IP address, the HCX Manager interprets this as an indication that the VM is unreachable and subsequently removes the MON configuration from that virtual machine.

Resolution

  • Once the storage issue is fixed and the virtual machine GuestOS is back, and the VMtools start to respond, the following message is seen:
    <timestamp> In(05) vmx - TOOLS Received tools.set.versiontype rpc call, version = 13317, type = 4

 

  • The HCX Manager identifies that the VM is replying to the IP and then enables MON again:
    <tmestamp> UTC [NetworkStretchService_SvcThread-30738, j: cda5d83f, vm: vm-#####, vnic: 0, portKey: #########-####-####-####-########, PR, TxId: #########-####-####-####-########] INFO  c.
    v.v.h.n.AttachVMVnicToPRNetworkJob- vNic: {"vmMoref":"vm-#####","nicIndex":0} has one or more of its IPs: ["<IP-Address>"] relevant to the stretch record: #########-####-####-####-########. Egress to be set to local
    ...
    <timestamp> UTC [NetworkStretchService_SvcThread-30745, j: 77e942f2, vm: vm-#####, nicIndex: 0, PR, TxId: #########-####-####-####-########] INFO  c.v.h.a.n.NsxTransformersAdapter- For segment port default:#########-####-####-####-########, setting extra config key com.vmware.nsx.port.extraConfig.remoteRtr -> 'null'. Old value was '<IP-Address> <MAC-Address> <MAC-Address> <MAC-Address>'. Admin state - old: UP, new: DOWN.
    <timestamp> UTC [NetworkStretchService_SvcThread-30745, j: 77e942f2, vm: vm-#####, nicIndex: 0, PR, TxId: #########-####-####-####-########] INFO  c.v.h.a.n.NsxTransformersAdapter- For segment port default:#########-####-####-####-########, setting extra config key com.vmware.nsx.port.extraConfig.remoteRtr -> '<IP-Address> <MAC-Address> <MAC-Address> <MAC-Address> LE'. Old value was 'null'. Admin state - old: DOWN, new: UP.  <--- LE Flag is re-added

Additional Information

"state in doubt; requested fast path state update" error in vmkernel.log