/opt/vmware/vcloud-director/logs/vmware-vcd-watchdog.log it can be seen that the service is being restarted:<YYYY-MM-DD> 09:16:29 | INFO | vmware-vcd-cell running<YYYY-MM-DD> 09:21:30 | ALERT | vmware-vcd-cell is dead but /var/run/vmware-vcd-cell.pid exists, attempting to restart it<YYYY-MM-DD> 09:21:40 | INFO | Started vmware-vcd-cell (pid=478962)<YYYY-MM-DD> 09:21:40 | WARN | Server status returned HTTP/1.1 404<YYYY-MM-DD> 09:22:40 | WARN | Server status returned HTTP/1.1 503<YYYY-MM-DD> 09:23:40 | WARN | Server status returned HTTP/1.1 503<YYYY-MM-DD> 09:24:40 | WARN | Server status returned HTTP/1.1 503<YYYY-MM-DD> 09:26:41 | INFO | vmware-vcd-cell running<YYYY-MM-DD> 09:31:41 | INFO | vmware-vcd-cell runningdmesg command is run and the output checked on the appliance then it can be seen that an out of memory killer was activated and started killing process:journalctl command is run on the appliance and the output checked then it can be seen that a kernel panic RIP (Register Instruction Pointer) occurred and the Out Of Memory (oom-kill) was invoked.VMware Cloud Director 10.6.x
Memory was being consumed at too high of a rate for the appliance to handle. This resulted in the kernel terminating processes to prevent a total system crash when RAM is critically low.
The sizing of the Cloud Director server group needs to increase or the number of requests needs to be limited.
Review the current sizing of the Cloud Director appliances in the server group and take corrective action to increase the sizing to large or extra large(VVS) as outlined in the VMware Cloud Director Appliance Sizing Guidelines.
The procedure for resizing is documented here: Recommended Procedure for resizing VMware Cloud Director Appliances
If the appliances are already right-sized then requests need to be limited coming into Cloud Director. That would have to be performed outside of VMware Cloud Director at the loadbalancer level.