"ENTER MAINTENANCE DRYRUN CHECK" Precheck fails with "Error: Error during enter MAINTENANCE check due to InsufficientResourcesFault"
search cancel

"ENTER MAINTENANCE DRYRUN CHECK" Precheck fails with "Error: Error during enter MAINTENANCE check due to InsufficientResourcesFault"

book

Article ID: 318853

calendar_today

Updated On:

Products

VMware Cloud Foundation

Issue/Introduction

Symptoms:
  • ENTER MAINTENANCE DRYRUN CHECK  Precheck fails with Error : Error during enter MAINTENANCE check due to InsufficientResourcesFault
  • In lcm.log  you will see similar entries as below
2020-07-07T00:03:20.472+0000 ERROR [0000000000000000,0000,precheckId=180134f2-9c3f-414c-b536-4d3540700f70,resourceType=NSX,resourceId=af69be13-9695-4235-8259-19ae819dee24] [c.v.e.s.l.c.v.vsphere.VsphereUtils,pool-2-thread-449] Error during enter MAINTENANCE check due to InsufficientResourcesFault { "_msg": "Insufficient resources.", "_faultMsg": [ { "key": "com.vmware.cdrs.maintenancemode.clusterLoadViolated", "arg": [ { "key": "threshold", "value": 80 }, { "key": "clustload", "value": 92 }, { "key": "resource", "value": "memory" } ], "message": "Host cannot enter maintenance mode since the resulting cluster memory load (92%) exceeds the tolerence threshold (80%)." } ], "stackTrace": [], "suppressedExceptions": [] }
  • You may notice similar message as below
Error description: the virtual machine is pinned to a host: Insufficient resources.
Impact: High: Do not perform upgrade without addressing this issue.
Remediation: This category is for errors that were not due to general exceptions. Check for errors in error object, check LCM logs –InsuffcientResourceFault: Check the HA settings in the VC UI. See if the error is due to HA configuration KB
: https://kb.vmware.com/s/article/2005073 Or This issue occurs when the CPU resource reservations of the virtual machine exceed the available capacity on the target host. KB :https://kb.vmware.com/s/article/1031306
  • In lcm.log you will see similar message as below
Error during enter MAINTENANCE check due to InsufficientResourcesFault
"Host cannot enter maintenance mode since the resulting cluster memory load (81%) exceeds the tolerence threshold (80%)
Host cannot enter maintenance mode since the resulting cluster cpu load (89%) exceeds the tolerence threshold

 
2020-09-15T20:30:29.355+0000 ERROR [0000000000000000,0000,precheckId=2722bf5e-9a63-4459-8560-9599cf7c2145,resourceType=ESX,resourceId=c71c0021-9fb8-11e8-a910-1dd898ec6280] [c.v.e.s.l.c.v.vsphere.VsphereUtils,pool-2-thread-64] Error during enter MAINTENANCE check due to InsufficientResourcesFault { "_msg": "Insufficient resources.", "_faultMsg": [ { "key": "com.vmware.cdrs.maintenancemode.clusterLoadViolated", "arg": [ { "key": "threshold", "value": 80 }, { "key": "clustload", "value": 81 }, { "key": "resource", "value": "memory" } ], "message": "Host cannot enter maintenance mode since the resulting cluster memory load (81%) exceeds the tolerence threshold (80%)." } ], "stackTrace": [], "suppressedExceptions": [] }
2020-09-15T20:30:29.355+0000 WARN [0000000000000000,0000,precheckId=2722bf5e-9a63-4459-8560-9599cf7c2145,resourceType=ESX,resourceId=c71c0021-9fb8-11e8-a910-1dd898ec6280] [c.v.v.v.c.h.i.HttpConfigurationCompilerBase$ConnectionMonitorThreadBase,pool-2-thread-64] Shutting down the connection monitor.
2020-09-15T20:30:29.355+0000 WARN [0000000000000000,0000] [c.v.v.v.c.h.i.HttpConfigurationCompilerBase$ConnectionMonitorThreadBase,VLSI-client-connection-monitor-254] Interrupted, no more connection pool cleanups will be performed.

Note:The preceding log excerpts are only examples.Date,time and environmental variables may vary depending on your environment.

Environment

VMware Cloud Foundation 4.1
VMware Cloud Foundation 3.10.x
VMware Cloud Foundation 4.2.x
VMware Cloud Foundation 4.0.x

Resolution

From VCF  4.3 version  we have provided a flag to suppress the dry run EMM flags to ignore this during upgrade and proceed. 
  1. Add the following lines to the end of /opt/vmware/vcf/lcm/lcm-app/conf/application-prod.properties :
lcm.nsxt.suppress.dry.run.emm.check=true
lcm.esx.suppress.dry.run.emm.check.failures=true


       2. Restart the lcm service using below command
           systemctl restart lcm


Workaround:
Cluster utilization can be verified from VC as below:
  1. Login to VC
  2. Navigate to the cluster under "Hosts and Clusters"
  3. Under "Monitor" tab, look for "vSphere DRS"
  4. Verify "CPU Utilization"
  5. Verify "Memory Utilization".

Then follow any of the below steps
1) Shut down VMs that are not critical for  operation during upgrade. This will reduce memory/CPU requirement on cluster.
2) Disable HA and HA Admission Control this will remove memory constraints.

Then retry the precheck.

Additional Information

Impact/Risks:
None