This article provides generic scripts that can help isolate intermittent latency and packet drop issues in VMware by Broadcom environments. These scripts are designed to capture network flow information, monitor network statistics, and perform packet captures. Please ensure you modify the scripts' content to fit your specific use case and environment before executing them.
Pre-Requisites:
These are the script samples that can be used in various troubleshooting scenarios. User has to determine the right script(s) to use for a specific use case
Script 1: To collect the getflows output.
Explanation/Instructions:
- Creates a folder for the capture session.
- Captures DFW getflows information for the IP ##.##.##.##
- Runs every 8 seconds for 12 hours (5400 times).
- Logs are stored in /var/run/log/getflows/
- Change the nic-xxxxx-ethx-vmware-sfw.2 to match the environment.
- Make necessary changes to time to run, Destination directory, etc
#!/bin/bash
mkdir /var/run/log/getflows/ for i in $(seq 1 1 14) ; do currDate=$(date +%Y-%m-%d_%H-%M-%S) mkdir /var/run/log/getflows/$currDate # 5400 * 8 = 43200 equates to 12 hours. for i in $(seq 1 1 5400) ; do echo "======================================" >> /var/run/log/getflows/$currDate/getflows.txt; date >> /var/run/log/getflows/$currDate/getflows.txt; vsipioctl getflows -f nic-xxxxxxx-ethX-vmware-sfw.2 | grep "##.##.##.##" >> /var/run/log/getflows/$currDate/getflows.txt; sleep 8; done done
Script 2: To collect the netstats output
Instructions:
- Collects netstats every second for 21 hours.
- Removes older files to ensure only the latest 500 files are retained.
- Logs are stored in /var/run/log/netstats/.
#!/bin/bash
mkdir /var/run/log/netstats/
for i in $(seq 1 1 75600) ; do
currDate=$(date +%Y-%m-%d_%H-%M-%S)
net-stats -i 1 -ticqQWS -A > /var/run/log/netstats/netstats-timed-$currDate
# Directory to monitor
directory="/var/run/log/netstats/"
# Number of new files to keep
x=500
# Find the newest files and keep only X of them
ls -t "$directory" | tail -n +$((x+1)) | while read file; do
rm -f "$directory/$file"
done
sleep 8;
done
Script 3: To collect the packet captures
Instructions:
- Captures network traffic for IP ##.##.##.## using pktcap-uw with specific filters.
- Each packet capture file will be 500 MB in size.
- Keeps only the latest 20 .pcapng files in the capture directory.
- Stops the capture after 10 hours (2016 cycles of 300 seconds).
- Change the nic-xxxxx-ethx-vmware-sfw.2 to match the environment.
- Make necessary changes to time to run, Destination directory, etc
#!/bin/bash
mkdir /var/run/log/packetcapture/
nohup pktcap-uw --capture PreDVFilter,PostDVFilter --dvfilter nic-XXXXXXX-ethX-vmware-sfw.2 --ip ##.##.##.## --ng --snaplen 150 -C 500 -o /var/run/log/packetcapture/PreDVF_PostDVF.pcapng &
for i in $(seq 1 1 2016) ; do
# Directory to monitor
directory="/var/run/log/packetcapture/"
# Number of new files to keep
x=20
# Find the newest files and keep only X of them
ls -t "$directory" | tail -n +$((x+1)) | while read file; do
rm -f "$directory/$file"
done
sleep 300
done
kill $(lsof |grep pktcap-uw |awk '{print $1}'| sort -u)
Create the Scripts: Save each of the above scripts into separate .sh
files on the ESXi host:
packetcapture.sh
getflows.sh
netstats.sh
Run the Scripts: To execute the scripts, run the following commands on the ESXi host. These will run the scripts in the background using setsid
.
Example:
Stop the Scripts Early (Upon Issue Occurrence): If you encounter the latency/packet drop issue, stop the scripts to prevent unnecessary data collection. Use the following commands to kill the running processes:
Important Note:
Data should be collected ideally within 1 hour of experiencing the issue, as data collected beyond that timeframe may rollover and may not be useful for troubleshooting.