[VMC on AWS] DHCP Pool Exhaustion Troubleshooting
search cancel

[VMC on AWS] DHCP Pool Exhaustion Troubleshooting

book

Article ID: 314123

calendar_today

Updated On:

Products

VMware Cloud on AWS

Issue/Introduction

Provide a resolution and workaround to customers who are consuming 100% of the DHCP Pool IPs for a given network segment.


Symptoms:

Customer is unable to attain a DHCP IP lease for a specific network segment.
Customer has had multiple VMs created and destroyed across a given 24-hour period. This is most commonly seen with Horizon VDI clones but can happen to normal workload VMs as well.
Under the VMC Console > Networking & Security Tab > Tier-1 Gateways > Compute Gateway > DHCP - Local | 1 Servers > View Statistics button, the IP allocation is at 100% for the impacted network segment.
image.png


Cause

The leased DHCP IPs for a given pool are not properly released when the VM(s) who own the lease fails to release the IP prior to be shutdown or deleted.

See https://kb.vmware.com/s/article/76275 for best practices to avoid this scenario when using Horizon VDI Instant Clones.

Resolution

The below steps can be used to reset the entire DHCP Pool for a given network segment. Refer to the above impacts/risks prior to implementing given resolution.
  1. Login to the VMC Console > Networking & Security Tab > Segments for the affected SDDC (You can also login to the NSX Manager Standalone UI and follow the same procedure)
  2. Select the affected Network Segment > Click the triple dots > Edit
  3. Change the type of Network Segment from Routed to Disconnected > Click Save
  4. Wait 10-15 seconds for the NSX-T Control Plane sync to take place
  5. Change the type of Network Segment from Disconnected back to Routed > Click Save


Workaround:

The three workarounds available beyond a reset of the entire DHCP Pool is:

  1. Create a new temporary network segment which contains a large enough CIDR range to provision all the desired workload VMs. With this net-new segment in the environment, the segment's DHCP Pool will not yet be utilized, effectively unblocking the customer from proceeding with workload VM provisioning. Keep in mind that Security Group/Firewall Groups will need to be modified to include this temporary network segment.
  2. Pause VM provisioning to the network segment and wait the default lease period of 84600 seconds (24 hours) for all the DHCP leases to expire. Reconfigure the affect network segment to have an as short as possible DHCP lease period so the IPs not in use will be more frequently available for reuse within the segment.
  3. Expand out the CIDR range of the impacted segment, allowing for more host IPs to be available within the range. For example, going from a /24 CIDR to a /22 CIDR would allow for an additional 768 IPs to be assigned via DHCP.


Additional Information

Impact/Risks:

If 100% of the DHCP leases for a given network segment are removed while there are still active VMs running with leased IPs from said pool, there is the chance for duplicate IPs to be assigned to multiple workload VMs. Further to implement the resolution, the affected network segment needs to be temporarily disconnected from the T1 router which will impact 100% of the network connectivity for the workload VMs running on said segment which are trying to communicate with any device outside of its L2 broadcast domain. Segment-local L2 traffic will not be impacted.

Note: If the above risk is not able to be accepted, proceed with the three workaround options available.