Proton services flapping on the Manager nodes running version 2.5
search cancel

Proton services flapping on the Manager nodes running version 2.5

book

Article ID: 306199

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Symptoms:
MP services are slow or management appliance runs out of memory.

Logs and Outputs:
top outputs show High Memory Usage by appl-proxy process          

top - 03:09:50 up 3 days, 6:29, 5 users, load average: 23.03, 24.47, 22.65

Tasks: 272 total, 2 running, 162 sleeping, 0 stopped, 0 zombie

%Cpu(s): 73.8 us, 8.7 sy, 0.0 ni, 16.2 id, 0.1 wa, 0.0 hi, 1.2 si, 0.0 st

KiB Mem : 49446560 total, 895412 free, 47962452 used, 588696 buff/cache

KiB Swap: 0 total, 0 free, 0 used. 842096 avail Mem

 

 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

14907 uproton 20 0 18.041g 0.011t 0 S 592.4 23.3 509:03.53 java

 2127 nsx 10 -10 17.646g 0.010t 12164 S 44.9 22.4 24:08.50 java

 5151 appl-pr+ 20 0 7412932 6.567g 3272 S 9.0 13.9 269:06.32 appl-proxy <<<<<----------

 

  • The  /var/log/proton/proton-tomcat-wrapper.log  show the JVM process crashed due to Failure to allocate memo

INFO | jvm 7 | 2019/09/18 00:07:33 | Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x000003b21fbb0000, 65536, 1) failed; error='Cannot allocate memory' (errno=12)

INFO | jvm 7 | 2019/09/18 00:07:33 | #

INFO | jvm 7 | 2019/09/18 00:07:33 | # There is insufficient memory for the Java Runtime Environment to continue.

INFO | jvm 7 | 2019/09/18 00:07:33 | # Native memory allocation (mmap) failed to map 65536 bytes for committing reserved memory.

INFO | jvm 7 | 2019/09/18 00:07:33 | # An error report file with more information is saved as:

INFO | jvm 7 | 2019/09/18 00:07:33 | # /tmp/hs_err_pid26275.log

 

  • Manager shows lots of connection in TIME_WAIT state to TN's

root@nsx-mgr-0:~# netstat -anlp |grep 1234 | grep #.#.#.128

tcp        0      0 #.#.#.12:1234         #.#.#.128:29246     TIME_WAIT   -

tcp        0      0 #.#.#.12:1234         #.#.#.128:29302     TIME_WAIT   -

tcp        0      0 #.#.#.12:1234        #.#.#.128:29328     TIME_WAIT   -
 

  • TNs syslog shows following WARN messages 

<180>1 2019-09-18T05:49:50.626Z prom-05056a03f4.nsbucqesystem.test NSX 2209503 - [nsx@6876 comp="nsx-esx" subcomp="opsagent" s2comp="nsx-rpc" tid="2209524" level="WARN"] RpcConnection[10 Connected to tcp://127.0.0.1:4096] Dropping a frame received from an unknown stream 9ed66956-cea5-4da4-abdd-####### without service name

<180>1 2019-09-18T05:49:50.637Z prom-05056a03f4.nsbucqesystem.test NSX 2209503 - [nsx@6876 comp="nsx-esx" subcomp="opsagent" s2comp="nsx-rpc" tid="2209524" level="WARN"] RpcConnection[10 Connected to tcp://127.0.0.1:4096] Dropping a frame received from an unknown stream 50dbe277-91ba-4a8b-bb93-######### without service name

 

Environment

VMware NSX-T Data Center
VMware NSX-T Data Center 2.x

Cause

The Client "TN"  failed to close the connection if the ACK is received by the client  from manager after 60 sec time out.

Resolution

Currently there is no resolution.

Workaround:
For workaround Restart "appl-proxy service" on the affected Manager node.