Execute the following command on the cell to get the current number of Nat Sessions per protocol, to identify a large use of NatSessions and additionally for which Protocol (UDP = 17, TCP = 6). A list of friendly names for the Protocol numbers can be found here: https://www.iana.org/assignments/protocol-numbers/protocol-numbers.xhtml
Get-NetNatSession | Group-Object -Property Protocol -NoElement
Enable the WinNat Service Operations Log (does not need a restart).
$logName = 'Microsoft-Windows-WinNat/Oper' $log = New-Object System.Diagnostics.Eventing.Reader.EventLogConfiguration $logName $log.IsEnabled=$true $log.SaveChanges()
Get-WinEvent -ProviderName "Microsoft-Windows-WinNat" | Format-List a
nd look for events similar to “NAT instance XXXXXXX failed to allocate a UDP port dynamically because all ports in the instance's external address pool are in use”.Validate that the Cell has enough Dynamic ports assigned to UDP and TCP. The Default is 16384 for each. This can be achieved by Get-NetTCPSetting
and Get-NetUDPSetting
.
Monitor the CPU usage of the `System` Process PID 4 with Sysinternals Process Explorer (https://docs.microsoft.com/en-us/sysinternals/downloads/process-explorer). This process includes the WinNat service. If the process PID 4 looks like using one full cpu, e.g. 25% on a host with 4 cpus, look at the threads of this process. A high amount of `ntoskrnl.exe+0x74A90` indicates that the WinNat service is waiting to get new ports.
If the number of Nat sessions for UDP or TCP is close or equal to the configured Dynamic Port Range, does indicate that the application instances are opening more than the available ports or are opening and closing faster than the default UDP session timeout of 300 seconds. A high number of ntoskrnl.exe+0x74A90
for the PID 4 process further strengthens the assumption.
If the number of Nat sessions for UDP or TCP is below the configured Dynamic Port Range, but there are “NAT instance XXXXXXX failed to allocate a UDP port dynamically” messages in the WinNAT log, does indicate that there are app instances which require in bursts more than the default 100 ports available to a container.
Modify the PortChunkSize property up to 2000:
New-ItemProperty "HKLM:\SYSTEM\CurrentControlSet\Services\WinNat" -Name "PortChunkSize" -Value 2000 -PropertyType "Dword"
. This might help applications which require a large amount of ports in bursts. This requires a reboot to take effect and hence has to be done in the stemcell build process.
Increase the number of Dynamic Ports:
Set-NetUDPSetting -DynamicPortRangeStartPort 39536 -DynamicPortRangeNumberOfPorts 26000
. This might give the cell enough headroom to clean up old UDP sessions to make room for new. This requires a reboot to take effect and hence has to be done in the stemcell build process.
(Currently only on Windows Server 1903) Modify the WinNAT UDP Timeout property:
New-ItemProperty "HKLM:\SYSTEM\CurrentControlSet\Services\WinNat" -Name "UdpSessionTimeout" -Value 30 -PropertyType "Dword"
.
This does decrease the time necessary to clean up old UDP sessions and hence make port available quicker to new sessions.
This requires a reboot to take effect and hence has to be done in the stemcell build process.