HCX - Service Mesh diagnostics test returns 2 failed probes
search cancel

HCX - Service Mesh diagnostics test returns 2 failed probes

book

Article ID: 323350

calendar_today

Updated On:

Products

VMware HCX

Issue/Introduction

Identify a known issue with respect to HCX SM diagnostic test.

Symptoms:
HCX Service Mesh (SM) diagnostic test performed through UI returns 2 failed probes as shown below:

GO to
HCX Manager UI >> Interconnect >> Service Mesh >> MORE >> Run Diagnostics

image.png

Note: In HCX Manager Service Mesh wizard, all diagnostic tests will be passed (green check mark) yet failed probes will be reported.

Below errors will be seen in HCX app-engine logs during diagnostics tests:
INFO  c.v.v.h.s.i.EndpointDiagnosticsJob- Diagnostics Response from hcmProbe: 

{"type":"REACHABILITY_PING","source":"<source ip address>","destination":"<destination ip address>","sourcePort":0,"destPort":0,"protocol":"ARP","data":{"maxLatency":"","minLatency":"","avgLatency":"","probeCount":0,"probesLost":0,"probes":null},"destType":"STATIC-ROUTE-GATEWAY","status":"FAILURE","timestamp":1706733769,"error":{"output":"/usr/sbin/arping -c 2 -w 1 -I eth0 <destination ip address>: exit status 1","message":""},"commandOutput":"/usr/sbin/arping -c 2 -w 1 -I eth0 <destination ip address>\nARPING <destination ip address> from <source ip address> eth0\nUnicast reply from <destination ip address> [00:00:0C:9F:F0:01]  6.271ms\nSent 2 probes (1 broadcast(s))\nReceived 1 response(s)\n"}]


Cause

In our current implementations of diagnostics test, the timeout value configured for ARPING command is a bit aggressive and may not be efficient to discover hosts within stipulated amount of time. As a result, it may return arp probe failure.

Note: arping is a tool for probing hosts in a given network. It operates at the data link layer and uses the Address Resolution Protocol (ARP) request to the destination host and wait for subsequent ARP reply.

Resolution

This issue will be fixed in the 4.9.0 or higher release of HCX software

Workaround:
None at the moment.

Additional Information

Impact/Risks:
  • The HCX Service Mesh diagnostic probe failure causing a false positive alarm and should be considered as cosmetic.
  • This issue is only applicable to HCX version 4.8.x.
  • Migration and Network Extension services will remain unaffected.
  • Diagnostics test will be completed successfully without any functional impact.