Note: The main purpose of using Ops Manager VM status metrics is for observation and troubleshooting, it is not intended for scaling the environment.
For information about how to identify if there is memory contention indications, refer to Key Performance Indicators.
To further analyze why the memory consumption, follow these steps and commands.
1. Run this command to calculate the used memory in GBs from all running processes:
ps aux | awk 'BEGIN {sum=0} {sum +=$6} END {print sum/1024/1024}'
2. To get the used memory for all Diego Cells, run this command:
bosh -d cf-<ID> ssh diego_cell -c 'ps aux | awk '\''BEGIN {sum=0} {sum +=$6} END {print sum/1024/1024}'\''' | grep -vE 'Unauthorized use is strictly prohibited|is subject to logging and monitoring|Connection to.*closed'
3. The output of these commands provides the sum of used memory by all running processes. Check if used memory is different from the output of this command:
free -g cat /proc/meminfo
If these two values are different, then it is possible that some memory might be used as cache. For vSphere environments ballooning might be in place to verify the value of ballooning on all Diego Cells:
bosh -d cf-ID ssh diego_cell -c 'vmware-toolbox-cmd stat balloon' | grep -vE 'Unauthorized use is strictly prohibited|is subject to logging and monitoring|Connection to.*closed'
4. To verify which process is utilizing the most memory, run this command:
ps aux --sort=-%mem | head
5. For very detailed information about processes and threads, run the following commands to verify the CPU and memory usage of the threads:
ps aux ps axjfww ps -o pid,user,%mem,command ax | sort -b -k3 -r pmap <PID> pmap <PID> | tail -n 1 Now you can also list how much memory is used by multiple processes using their PIDs with pmap as follows: sudo pmap <PID1> <PID2> | grep total
For Diego Cells, the main purpose of them it to run requested applications pushed with Cloud Foundry (CF).
Without these App processes, the overall memory consumption should be close to only the minimum. However, if the hypervisor needs to reclaim memory due to contention, the IaaS might reclaim the memory by using ballooning method.
Ballooning is a driver, which when triggered by ESXi host memory utilization above 90%, starts requesting memory from the system up to 65% of Memory capacity. When needed the balloon will "POP" releasing the memory and allowing the Hypervisor to reclaim it.
This process is not triggered unless memory contention event is present. Which can leave signs of memory usage even if there is no real process using this memory.