NSX-SHA fails to complete "sudo cat /sys/class/net/eth0/operstate" due to timeout
search cancel

NSX-SHA fails to complete "sudo cat /sys/class/net/eth0/operstate" due to timeout

book

Article ID: 314224

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

The purpose of this KB is to provide a way to figure out why nsx-sha reports false-positive alarm and how to resolve this issue.


Symptoms:

An alarm reported from an edge node is appears in NSX UI but it is false-positive:

2024-03-11T09:59:34.787Z FATAL pool-64-thread-2 MonitoringServiceImpl 69664 MONITORING [nsx@6876 

alarmId="########-####-####-####-############" alarmState="OPEN" comp="nsx-manager" entId="########-####-####-####-############" errorCode="MP701099" eventFeatureName="edge_health" eventSev="CRITICAL" eventState="On" eventType="edge_nic_link_status_down" level="FATAL" 

nodeId="########-####-####-####-############" subcomp="monitoring"] Edge node NIC eth0 link is down.

Environment

VMware NSX-T Data Center
VMware NSX-T Data Center 3.x

Cause

This happens because nsx-sha cannot complete running the command within the specific time properly.

We can see "Timeout req may hanging" message in var/log/syslog:

2024-03-11T09:48:23.148Z SWN-PS-3Z-PB-T0-BASE06-vmedge01.ps.krw.pb NSX 425692 - [nsx@6876 comp="nsx-edge" subcomp="nsx-sha" username="nsx-sha" level="WARNING" s2comp="fork-monitor"] Req timeout, waiting for 36.31458378955722 seconds: {'cmd': ['sudo', 'cat', '/sys/class/net/eth0/operstate'], 'input': None, 'shell': False, 'timeout': 4, 'resp_queue': <queue.Queue object at 0x71c384076e50>, 'env': None, 'type': 0, 'timestamp': 13663939.608154405, 'seq': 1518, 'timed_out': 13663975.922738194, 'timed_log': 13663975.922738194}

2024-03-11T09:49:23.534Z SWN-PS-3Z-PB-T0-BASE06-vmedge01.ps.krw.pb NSX 425692 - [nsx@6876 comp="nsx-edge" subcomp="nsx-sha" username="nsx-sha" level="WARNING" s2comp="fork-monitor"] Timeout req may hanging, waiting for 96.7007926274091 seconds: {'cmd': ['sudo', 'cat', '/sys/class/net/eth0/operstate'], 'input': None, 'shell': False, 'timeout': 4, 'resp_queue': <queue.Queue object at 0x71c384076e50>, 'env': None, 'type': 0, 'timestamp': 13663939.608154405, 'seq': 1518, 'timed_out': 13663975.922738194, 'timed_log': 13663975.922738194}

2024-03-11T09:55:25.735Z SWN-PS-3Z-PB-T0-BASE06-vmedge01.ps.krw.pb NSX 425692 - [nsx@6876 comp="nsx-edge" subcomp="nsx-sha" username="nsx-sha" level="WARNING" s2comp="fork-monitor"] Timeout req may hanging, waiting for 458.90203033946455 seconds: {'cmd': ['sudo', 'cat', '/sys/class/net/eth0/operstate'], 'input': None, 'shell': False, 'timeout': 4, 'resp_queue': <queue.Queue object at 0x71c384076e50>, 'env': None, 'type': 0, 'timestamp': 13663939.608154405, 'seq': 1518, 'timed_out': 13663975.922738194, 'timed_log': 13664338.149291515}

2024-03-11T09:55:27.639Z SWN-PS-3Z-PB-T0-BASE06-vmedge01.ps.krw.pb NSX 425692 - [nsx@6876 comp="nsx-edge" subcomp="nsx-sha" username="nsx-sha" level="WARNING" s2comp="fork-monitor"] Received resp for a timeout req, waiting for 460.8056724201888 seconds: {'cmd': ['sudo', 'cat', '/sys/class/net/eth0/operstate'], 'input': None, 'shell': False, 'timeout': 4, 'resp_queue': <queue.Queue object at 0x71c384076e50>, 'env': None, 'type': 0, 'timestamp': 13663939.608154405, 'seq': 1518, 'timed_out': 13663975.922738194, 'timed_log': 13664398.510184744}, {'seq': 1518, 'type': 0, 'executor': 0, 'timestamp': 13663939.608435009, 'execute_time': 460.79502287879586, 'output': b'up\n', 'error': 'Request timeout when waiting for response'}

2024-03-11T09:55:27.640Z SWN-PS-3Z-PB-T0-BASE06-vmedge01.ps.krw.pb NSX 425692 - [nsx@6876 comp="nsx-edge" subcomp="nsx-sha" username="nsx-sha" level="WARNING"] Failed to run command: {'cmd': ['sudo', 'cat', '/sys/class/net/eth0/operstate'], 'input': None, 'shell': False, 'timeout': 4, 'resp_queue': <queue.Queue object at 0x71c384076e50>, 'env': None, 'type': 0, 'timestamp': 13663939.608154405, 'seq': 1518, 'timed_out': 13663975.922738194, 'timed_log': 13664398.510184744} with error Request timeout when waiting for response

Resolution

This issue can be resolved by restarting nsx-sha service.

# service nsx-sha restart

Additional Information

Impact/Risks:

Customer can see the alarm in NSX UI but there's no impact.