Memory leak in dcsms process when 'show ip bgp' & 'show ip bgp neighbors' CLI commands are executed
search cancel

Memory leak in dcsms process when 'show ip bgp' & 'show ip bgp neighbors' CLI commands are executed

book

Article ID: 312648

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Symptoms:
  • dcsms process reaches very high memory utilization and normal processing gets impacted
  • Edge goes in out of memory (OOM) situation causing a reboot.
  • HA switchover keeps happening when the above commands are executed in a loop
  • Normal routing functionality impacted when dcsms reaches very high memory utilization
  • In the NSX Manager vsm.log messages are seen as the following:
 
2019-03-06 14:46:35.141 UTC  INFO SimpleAsyncTaskExecutor-1 StatusAndStatsUtil:822 - - [nsxv@6876 comp="nsx-manager" subcomp="manager"] Propogating vse event EDGE_MEMORY_USAGE_JUMP_UP Module: vShield Edge Appliance Severity Critical

2019-03-06 13:42:32.199 UTC  INFO http-nio-127.0.0.1-7441-exec-436 EdgeUtils:472 - - [nsxv@6876 comp="nsx-manager" subcomp="manager"] populateSystemEvent parameters : sourceName edge-xxx, morefIdOfObjectOnVc vm-xxxx, moduleName NSX Edge Health Check, eventCode EDGE_VM_HEALTHCHECK_NO_PULSE, severity Major, messageParams [vm-xxx] eventMetaData {edgeId=edge-xxx, edgeVmName=ESG-TEST-01, error=Configuration failed on NSX Edge VM vm-xxx. Kindly refer Edge and NSX Manager logs for more details., edgeVmVcUUId=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx, edgeVmId=vm-xxx}
2019-03-06 13:42:32.202 UTC  INFO http-nio-127.0.0.1-7441-exec-436 EventServiceImpl:119 - - [nsxv@6876 comp="nsx-manager" subcomp="manager"] [SystemEvent] Time:'Wed Mar 06 13:42:32.199 UTC 2019', Severity:'Medium', Event Source:'edge-xxx', Code:'30033', Event Message:'NSX Edge VM (vmId : vm-xxx) not responding to health check.', Module:'NSX Edge Health Check', Universal Object:'false'
 
  • In the NSX Edge logs, you see entries similar to:
 
2019-03-06T18:12:28+00:00 ESG-TEST-01 syslog-ng[868]: [default]:  [syslog.err] I/O error occurred while writing; fd='17', error='Network is unreachable (101)'
2019-03-06T18:12:46+00:00 ESG-TEST-01 syslog-ng[807]: [default]:  [syslog.err] Connection failed; fd='21', server='AF_INET(198.162.246.80:514)', local='AF_INET(0.0.0.0:0)', error='Network is unreachable (101)'
2019-03-06T18:12:56+00:00 ESG-TEST-01 syslog-ng[807]: [default]:  [syslog.err] Connection failed; fd='37', server='AF_INET(198.162.246.80:514)', local='AF_INET(0.0.0.0:0)', error='Network is unreachable (101)'
 
  • Followed by a reboot once memory reaches the maximum threshold:
 
2019-03-06T18:12:22+00:00 ESG-TEST-01 kernel[]: [default]:  [kern.warning] monit invoked oom-killer: gfp_mask=0x24201ca, order=0, oom_score_adj=0
...
2019-03-06T18:12:24+00:00 ESG-TEST-01 MsgMgr[1241]: [default]:  [daemon.info] payload len:350 data:{"systemEvents":[{"severity":"Critical","metaData":{"message":"23665 60960 176576 VseEventProcess  1237 47200 176748 vmtoolsd  1241  7732 330896 msgmgr  1128  2944  96720 monit   868  2832 199864 syslog-ng "},"timestamp":1551895943,"moduleName":"vShield Edge Appliance","eventCode":30180,"message":"OOM happened, system rebooting in 3 seconds..."}]}
2019-03-06T18:12:26+00:00 ESG-TEST-01 shutdown[23781]: [default]:  [user.notice] shutting down for system reboot
...
2019-03-06T18:12:46+00:00 ESG-TEST-01 routing[1029]: [default]:  [daemon.info] All SMS configuration is complete.

2019-03-06T18:12:23+00:00 ESG-TEST-01 kernel[]: [default]:  [kern.err] Killed process 1083 (dcsms) total-vm:1321552kB, anon-rss:778608kB, file-rss:224kB


Environment

VMware NSX for vSphere 6.4.x

Cause

NSX-V versions before 6.4.4 when executing the central CLI commands 'show ip bgp' and 'show ip bgp neighbors' in a loop i.e., using an automated script, cause a memory leak into the dcsms process. The memory leak is observed only when those CLI commands are executed. If being run continuously in the background eventually OOM will occur.

Resolution

This issue is resolved in NSX 6.4.6

Workaround:
No workaround available.