Whereabouts duplicate IP detect and cleanup script
search cancel

Whereabouts duplicate IP detect and cleanup script

book

Article ID: 424945

calendar_today

Updated On:

Products

VMware Telco Cloud Automation VMware Tanzu Kubernetes Grid

Issue/Introduction

This script detects and resolves IP address conflicts in Kubernetes clusters using the Whereabouts CNI plugin.

It identifies duplicate IP allocations in both pod annotations and Whereabouts CRDs, then resolves them using a deterministic priority system.

Environment

Telco Cloud Automation 3.2.0.1

TKG versions prior to 2.5.4

Cause

Whereabouts v0.5.4 has a known issue that results in this behavior.

 

Resolution

Overview

This script detects and resolves IP address conflicts in Kubernetes clusters using the Whereabouts CNI plugin. It identifies duplicate IP allocations in both pod annotations and Whereabouts CRDs, then resolves them using a deterministic priority system.

Download the script attached to this KB and copy it to the Kubernetes cluster's control plane.

 

Prerequisites

Required Tools

  • `kubectl` - Must be configured with access to your Kubernetes cluster
  • `jq` - JSON processor (version 1.5+)
  • `base64` - Base64 encoding/decoding (usually pre-installed)
  • `bash` - Version 4.0+ (for associative arrays)

Optional

  • `timeout` - Command timeout utility (script has fallback if not available)

Permissions

  • Read access to pods in all namespaces
  • Read access to `ipreservations` CRDs in all namespaces
  • Write access to `ipreservations` CRDs (for Mode B and C)
  • Delete access to pods (for Mode C)

 

Basic Usage

./detect_IP_conflict_v1.0.2.sh [MODE]

Modes

Mode A: Detect Only (Default)

**Read-only mode** - Scans and reports conflicts without making changes.

./detect_IP_conflict_v1.0.2.sh A

# or simply (A is default)

./detect_IP_conflict_v1.0.2.sh

**What it does:**

  • Scans all pods for duplicate IP addresses in network-status annotations
  • Scans all Whereabouts IPReservation CRDs for duplicate IP allocations
  • Reports which pods will be kept/deleted based on priority rules
  • **No changes are made**

**Use when:**

  • You want to see what conflicts exist
  • You want to review decisions before taking action
  • Troubleshooting IP conflicts

Mode B: Fix CRD Entries

**Fixes CRD issues** - Removes stale and duplicate entries from Whereabouts CRDs.

./detect_IP_conflict_v1.0.2.sh B

**What it does:**

  • Performs all Mode A operations (detection)
  • Removes stale allocations (pods that no longer exist)
  • Removes duplicate IP allocations from CRDs
  • Updates CRDs with cleaned allocations
  • **Does NOT delete pods**

**Use when:**

  • CRDs have stale entries for deleted pods
  • CRDs have duplicate IP allocations
  • You want to clean up CRDs without deleting pods

Mode C: Fix CRD + Delete Pods

**Full cleanup** - Most aggressive mode, fixes CRDs and deletes duplicate pods.

./detect_IP_conflict_v1.0.2.sh C

**What it does:**

  • Performs all Mode A operations (detection)
  • Performs all Mode B operations (CRD cleanup)
  • **Deletes pods** that are marked as duplicates
  • Uses `--grace-period=0 --force` for immediate deletion

**Use when:**

  • You've reviewed the conflicts and want to resolve them
  • You need to clean up both CRDs and pods
  • **WARNING**: This will delete pods - use with caution!

 

Example Usage Scenarios

Scenario 1: Initial Assessment

  1. See what conflicts exist:

    ./detect_IP_conflict_v1.0.2.sh A
     
  2. Review the output to see:
    • Which IPs have conflicts
    • Which pods will be kept/deleted
    • CRD status of each pod

**Output example:**
=== Whereabouts IP Duplicate Detector(v1.0.2) ===
Mode: A (A=Detect, B=Fix CRD, C=Fix+Delete Pods)
Priority: Pods with CRD entries in Whereabouts are kept over pods without CRD entries

Scanning pods for network-status IPs...
...

 Duplicate pod annotation IP detected: 10.0.1.5
   Found 2 pods with this IP:
     - default/app-pod-1 (created: 2024-01-01T10:00:00Z)
     - default/app-pod-2 (created: 2024-01-01T11:00:00Z)
       default/app-pod-1: CRD entry = true
       default/app-pod-2: CRD entry = false
   → Will DELETE: default/app-pod-2 (no CRD entry)
   → Will KEEP: default/app-pod-1 (has CRD entry in Whereabouts)

Scenario 2: Clean Up CRDs Only

  1. Clean up stale/duplicate CRD entries without deleting pods

    ./detect_IP_conflict_v1.0.2.sh B
     
  2. **What happens:**
    • Removes CRD allocations for pods that no longer exist
    • Removes duplicate IP allocations from CRDs
    • Pods remain untouched

Scenario 3: Full Cleanup

  1. After reviewing Mode A output, perform full cleanup

    ./detect_IP_conflict_v1.0.2.sh C
     
  2. **What happens:**
    • Cleans up CRDs (same as Mode B)
    • Deletes duplicate pods
    • Pods are immediately deleted** (no grace period)

Scenario 4: Large Cluster

  1. Increase timeout for large clusters

    KUBECTL_TIMEOUT=120 ./detect_IP_conflict_v1.0.2.sh A

 

Output Interpretation

Detection Phase (Mode A)

Duplicate pod annotation IP detected: <IP>
  Found N pods with this IP:
    - <namespace>/<pod> (created: <timestamp>)
    - <namespace>/<pod> (created: <timestamp>)
      <namespace>/<pod>: CRD entry = true/false
      <namespace>/<pod>: CRD entry = true/false
  → Will DELETE: <namespace>/<pod> (<reason>)
  → Will KEEP: <namespace>/<pod> (<reason>)

**Reasons for deletion:**

  • `no CRD entry` - Pod doesn't have CRD entry, other pod does
  • `has CRD, but current pod has higher priority` - Both have CRD, but tiebreaker favors other pod
  • `has CRD, but keep pod has higher priority` - Both have CRD, but tiebreaker favors keep pod

CRD Cleanup Phase (Mode B/C)

Processing CRD: <namespace>/<crd-name>
→ Removing stale allocation <namespace>/<pod>
→ Removing duplicate CRD allocation <namespace>/<pod> for IP <IP>
✔ Updated <namespace>/<crd-name>

Pod Deletion Phase (Mode C)

 Deleting pod <namespace>/<pod> (duplicate IP <IP>)
✓ Successfully deleted <namespace>/<pod>

 

Troubleshooting

Script Hangs:

  1. Check if timeout is working

    KUBECTL_TIMEOUT=5 ./detect_IP_conflict_v1.0.2.sh A
     
  2. Check kubectl connectivity

    kubectl get pods -A
     

"Command not found" Errors:

  1. Check if required tools are installed

    command -v kubectl
    command -v jq
    command -v base64
     

Permission Denied:

  1. Check kubectl permissions

    kubectl get pods -A
    kubectl auth can-i delete pods
     

No Conflicts Detected:

  1. This is normal if there are no IP conflicts
  2. Script will report "No duplicate pods to delete" in Mode C

Unexpected Pod Deletions:

  1. Review Mode A output first
  2. Check CRD entries for pods
  3. Verify pod creation timestamps
  4. Check priority rules match your expectations

 

Exit Codes

  • `0` - Success
  • `1` - Error (command failure, missing dependencies, etc.)
  • `124` - Timeout (if using timeout command)

 

Limitations

  1. **No dry-run mode** - Cannot preview changes without executing
  2. **No rollback** - Cannot undo deletions if script fails partway
  3. **No progress indicators** - May appear hung on large clusters
  4. **Immediate deletion** - Mode C uses `--grace-period=0 --force`
  5. **Memory usage** - May have issues with very large clusters (10,000+ pods)

 

Safety Features

  1. **Verification before deletion** - Checks pod exists before deleting
  2. **Deterministic selection** - Same input always produces same result
  3. **CRD refresh** - Uses fresh CRD data before making decisions
  4. **Error handling** - Continues processing even if individual operations fail
  5. **Timeout protection** - Commands won't hang indefinitely

 

Quick Reference

  • Detect conflicts (safe, read-only)
    ./detect_IP_conflict_v1.0.2.sh A
     
  • Fix CRDs only (no pod deletion)
    ./detect_IP_conflict_v1.0.2.sh B     
     
  • Full cleanup (deletes pods - use with caution!)
    ./detect_IP_conflict_v1.0.2.sh C
     
  • Custom timeout
    KUBECTL_TIMEOUT=60 ./detect_IP_conflict_v1.0.2.sh A

Additional Information

Best Practices

  1. Always Start with Mode A before taking action

    ./detect_IP_conflict_v1.0.2.sh A

     
  2. Review Output Carefully
    1. Check which pods will be deleted
    2. Verify CRD status is correct
    3. Ensure you understand the priority decisions
  3. Use Mode B First
    1. Clean up CRDs first

      ./detect_IP_conflict_v1.0.2.sh B
       
    2. Then review again

      ./detect_IP_conflict_v1.0.2.sh A
       
    3. Finally delete pods if needed

      ./detect_IP_conflict_v1.0.2.sh C

       
  4. Test on Non-Production First
    1. Run in test/staging environment first
    2. Verify behavior matches expectations
    3. Check that important pods aren't being deleted
  5. Monitor During Execution
    1. Watch for errors or warnings
    2. Check if pods are being deleted as expected
    3. Verify CRD updates are successful

Environment Variables

KUBECTL_TIMEOUT

Sets the timeout for kubectl commands in seconds. Default: 30 seconds.

  • # Use default 30 second timeout

    ./detect_IP_conflict_v1.0.2.sh A
     
  • # Use custom 60 second timeout

    KUBECTL_TIMEOUT=60 ./detect_IP_conflict_v1.0.2.sh A
     
  • # Use very short timeout for testing

    KUBECTL_TIMEOUT=5 ./detect_IP_conflict_v1.0.2.sh A

     

**When to adjust:**

  • Large clusters: Increase timeout (60-120 seconds)
  • Slow API server: Increase timeout
  • Testing: Decrease timeout to test timeout behavior

 

Priority Rules

The script uses deterministic priority to decide which pod to keep when multiple pods claim the same IP:

  1.  **Pods with CRD entries** > Pods without CRD entries
    • Pods that have entries in Whereabouts IPReservation CRDs are considered legitimate owners
  2.  **Older pods** (if both have or both don't have CRD entries)
    • Compares creation timestamps
    • Pod created earlier is kept
  3.  **Alphabetical** (if timestamps are equal or unavailable)
    • Falls back to alphabetical ordering by namespace/podname

 

Support

For issues or questions:

  1. Review the script output for error messages
  2. Check kubectl connectivity and permissions
  3. Verify all dependencies are installed
  4. Review the priority rules to understand decisions

Attachments

detect_IP_conflict_v1.0.2.sh get_app