Proton services flapping on the Manager nodes
search cancel

Proton services flapping on the Manager nodes

book

Article ID: 306199

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

MP services are slow or management appliance runs out of memory.

Logs and Outputs:
top outputs show High Memory Usage by appl-proxy process          

top - 03:09:50 up 3 days, 6:29, 5 users, load average: 23.03, 24.47, 22.65

Tasks: 272 total, 2 running, 162 sleeping, 0 stopped, 0 zombie

%Cpu(s): 73.8 us, 8.7 sy, 0.0 ni, 16.2 id, 0.1 wa, 0.0 hi, 1.2 si, 0.0 st

KiB Mem : 49446560 total, 895412 free, 47962452 used, 588696 buff/cache

KiB Swap: 0 total, 0 free, 0 used. 842096 avail Mem

 

 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

14907 uproton 20 0 18.041g 0.011t 0 S 592.4 23.3 509:03.53 java

 2127 nsx 10 -10 17.646g 0.010t 12164 S 44.9 22.4 24:08.50 java

 5151 appl-pr+ 20 0 7412932 6.567g 3272 S 9.0 13.9 269:06.32 appl-proxy <<<<<----------

 

  • The  /var/log/proton/proton-tomcat-wrapper.log  show the JVM process crashed due to Failure to allocate memo

INFO | jvm 7 | 2019/09/18 00:07:33 | Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x000003b21fbb0000, 65536, 1) failed; error='Cannot allocate memory' (errno=12)

INFO | jvm 7 | 2019/09/18 00:07:33 | #

INFO | jvm 7 | 2019/09/18 00:07:33 | # There is insufficient memory for the Java Runtime Environment to continue.

INFO | jvm 7 | 2019/09/18 00:07:33 | # Native memory allocation (mmap) failed to map 65536 bytes for committing reserved memory.

INFO | jvm 7 | 2019/09/18 00:07:33 | # An error report file with more information is saved as:

INFO | jvm 7 | 2019/09/18 00:07:33 | # /tmp/hs_err_pid26275.log

 

  • Manager shows lots of connection in TIME_WAIT state to TN's

root@nsx-mgr-0:~# netstat -anlp |grep 1234 | grep #.#.#.128

tcp        0      0 #.#.#.12:1234         #.#.#.128:29246     TIME_WAIT   -

tcp        0      0 #.#.#.12:1234         #.#.#.128:29302     TIME_WAIT   -

tcp        0      0 #.#.#.12:1234        #.#.#.128:29328     TIME_WAIT   -
 

  • TNs syslog shows following WARN messages 

<180>1 2019-09-18T05:49:50.626Z prom-05056a03f4.nsbucqesystem.test NSX 2209503 - [nsx@6876 comp="nsx-esx" subcomp="opsagent" s2comp="nsx-rpc" tid="2209524" level="WARN"] RpcConnection[10 Connected to tcp://127.0.0.1:4096] Dropping a frame received from an unknown stream xxxxxxxx-cea5-4da4-xxxx-xxxxxxxxxxxx without service name

<180>1 2019-09-18T05:49:50.637Z prom-05056a03f4.nsbucqesystem.test NSX 2209503 - [nsx@6876 comp="nsx-esx" subcomp="opsagent" s2comp="nsx-rpc" tid="2209524" level="WARN"] RpcConnection[10 Connected to tcp://127.0.0.1:4096] Dropping a frame received from an unknown stream xxxxxxxx-91ba-4a8b-bb93-xxxxxxxxxxx without service name

 

Environment

VMware NSX-T Data Center

Cause

The Client "TN"  failed to close the connection if the ACK is received by the client  from manager after 60 sec time out.

Resolution

Workaround:
For workaround Restart "appl-proxy service" on the affected Manager node.